Skip archive_enriched

RichardJ · August 7, 2017, 6:23pm

What does this option “–skip archive_enriched” exactly do? Is it only skipping the archive of enriched/good?

ihor · August 7, 2017, 6:40pm

Yes, it skips step 12 as per dataflow diagram. Essentially, the data will be loaded to the storage target but neither enriched not shredded files will be archived.

PS. The diagram is fully accurate for the Snowplow release up to R87.

RichardJ · August 8, 2017, 3:33pm

Thanks Ihor! Your response is very helpful.

Do you have a similar dataflow diagram for Snowplow Stream processing?

ihor · August 8, 2017, 4:42pm

@RichardJ,

Not in that format. Though, we do have a general Realtime Time pipeline (lambda) architecture: How to setup a Lambda architecture for Snowplow. It depicts just one of the approaches. Since the diagram was posted it became possible to utilise stream enriched data in the batch branch thus avoiding enrichment process in the batch by running EmrEtlRunner with --skip staging,enrich.

Topic		Replies	Views
No logs to process: No Snowplow enriched stream logs to process since last run Enrichment	2	1040	October 13, 2020
Recover from EMR failures with deduplication? Storage targets	4	2514	September 2, 2017
Snowflake dB loader versus snowpipe for storage step	4	1400	January 6, 2022
Snowplow Kinesis to EmrEtl For engineers	4	1773	July 31, 2019
EmrEtlRunner fails at Hadoop Shred step Storage targets	5	1621	May 22, 2020

Skip archive_enriched

Related topics