Hi All,
We are working on upgrading our snowplow RT events pipeline and are looking to introduce better engineering standards to our deployment.
We have a standard pipeline: collector
→ enrich
→ s3-loader
→ RDB loader
→ redshift
We have the same deployment across 3 environments DEV, STAGING and PROD
We have 2 AWS accounts (effectively 2 VPCs) one for DEV + STAGING and another for PROD
For deployment and testing we would like to replicate PROD events to the DEV + STAGING env.
What would you think is the best option for this:
- Replicate the
collector-good-stream-prod
tocollector-good-stream-dev
? tried using flink application but all the production events went to theenrich-bad-bucket-dev
- Read the raw events from
collector-good-bucket-prod
(how?) and use those to replay / replicate across accounts? - Or maybe there’s a third solution which is better than the above suggestions
Our goal here is for the dev
/ staging
env. to have the production events and testing events - the difference from the prod
env. is to keep the data for a much shorter time (e.g. 2 weeks) but have a 'feel of the pipeline in staging
before we roll the upgrade to prod
.