Understanding my options for Transforming/Loading

istreeter · October 18, 2022, 6:07am

It is possible to use transformer-kinesis, but simultaneously create a S3 archvie of the TSV-format events. This will be easier to explain if I summarise some of the processes that can run downstream of Enrich:

Process 1: Streaming transformer - reads from Kinesis and writes micro-batches of transformed data to S3
Process 2: Loading events directly to S3. (We use our custom S3 loader or you are using Firehose)
Process 3: Spark/EMR transformer - Reads the output of process 2 in batches, and re-writes it to S3 in transformed format.

Normally we recommend to either run just process 1, or alternatively to run process 2 and 3. Process 1 (streaming transformer) is the cheapest and most direct way to get events transformed and ready for loading. Whereas process 2 + 3 (s3 loader + emr transformer) is more mature Snowplow tech, and is proven to work on very high volume pipelines.

However, you could choose to run Process 1 and Process 2 in parallel. Kinesis lets you have multiple consumers of the same stream, which each see every single event. So your streaming transformer can transform all events and prepare them for the warehouse. While simultaneously you run the S3 loader (or Firehose) to read the same events from Kinesis and write them to S3 in TSV format.

By running Process 1 + Process 2, you get the benefits of cheap, fast warehouse loading via the streaming transformer, but you also get your TSV archive which means in future you have the option to do a full load to new destinations, if you ever need to.

Topic		Replies	Views
Snowplow Kinesis to EmrEtl For engineers	4	1774	July 31, 2019
Need help getting events from kinesis to s3 to redshift Storage targets	4	2518	April 6, 2016
Is it possible to load data to Redshift after StreamEnricher? Storage targets	10	2823	September 12, 2018
EMR ETL stream_enrich mode Enrichment	14	3088	September 21, 2019
Should I run rdb_load only? For engineers	7	1235	February 11, 2020

Understanding my options for Transforming/Loading

Related topics