EmrEtlRunner not loading data into RedShift

AllenWeieiei · November 8, 2019, 8:58pm

When I work with the DevOp team，we want to minimize the resources, risk, etc… A question of the process that we have right now is: What do these 2 enrich(stream enrich after collector, EmrEtlRunner enrich) steps really do in the process? Are they adding new fields? Increasing amount of records?

Thanks!

ihor · November 8, 2019, 10:01pm

@AllenWeieiei, they can do a lot of things depending on your needs. In broad terms, the following 2 tasks are performed:

data validation against the corresponding JSON schemas (data quality)
widening of the captured data with additional info (configurable)

The 1st item is a must. Any data failing validation will be rejected and (depending on your pipeline architecture) set aside for further examination/recovery/reprocessing.

The 2nd item allows you to enhance your data with additional values including those coming from 3rd parties. These are configurable and optional. Snowplow pipeline is very flexible and rich in its ability to be customized to your specific needs. Here’s the link to various enrichments you can add to the pipeline: https://github.com/snowplow/snowplow/wiki/Configurable-enrichments.

AllenWeieiei · November 11, 2019, 2:14pm

great! Thanks!

Topic		Replies	Views
Loading data from s3 to Redshift after EmrEtlRunner Troubleshooting	7	3571	November 19, 2018
RDB Loader, Storage Loader, EmrEtlRunner Storage targets	14	2312	October 22, 2019
Should I run rdb_load only? For engineers	7	1235	February 11, 2020
EmrEtlRunner sink Shredded data into S3 bucket For engineers	0	703	November 11, 2019
EmrEtlRunner failed to start For engineers	14	1181	April 23, 2019

EmrEtlRunner not loading data into RedShift

Related topics