Replicate collector events

Hi All,

We are working on upgrading our snowplow RT events pipeline and are looking to introduce better engineering standards to our deployment.

We have a standard pipeline: collectorenrichs3-loaderRDB loaderredshift

We have the same deployment across 3 environments DEV, STAGING and PROD

We have 2 AWS accounts (effectively 2 VPCs) one for DEV + STAGING and another for PROD

For deployment and testing we would like to replicate PROD events to the DEV + STAGING env.

What would you think is the best option for this:

  • Replicate the collector-good-stream-prod to collector-good-stream-dev? tried using flink application but all the production events went to the enrich-bad-bucket-dev
  • Read the raw events from collector-good-bucket-prod (how?) and use those to replay / replicate across accounts?
  • Or maybe there’s a third solution which is better than the above suggestions

Our goal here is for the dev / staging env. to have the production events and testing events - the difference from the prod env. is to keep the data for a much shorter time (e.g. 2 weeks) but have a 'feel of the pipeline in staging before we roll the upgrade to prod.

If you want data to propagate all the way through the pipeline this is the option I’d go for. Depending on your volume I’d skip Flink and just go straight for a simple Lambda that takes each message from the raw topic and pushes it into the raw-dev topic. As long as they bytes are the same you shouldn’t have any issues with this approach.

1 Like

Hi mike,

Thank you for your answer.

Unfortunately kinesis triggers / targets & lambdas don’t work cross accounts - our setup is that we have prod and dev + staging in different AWS accounts - this is the reason I tried using flink

Thank you for assuring that this method - taking the raw events from the production stream to the development / staging streams is the right idea - on the first attempt all the events went into the staging-enriched-bad-bucket so i’m not sure how to copy the events’ bytes and then debug the enricher to get to the root cause

If you have any knowledge that can help us with this it’s much appreciated.