Reprocessing / Rerunning logs from IGLU server failure for unstructured events

mjensen · June 23, 2017, 6:27pm

would just rerunning data and having dupes in redshift and then using this tutorial work?
http://discourse.snowplow.io/t/de-deduplicating-events-in-hadoop-and-redshift-tutorial/248

i don’t think de-duplication in enrichment would work since we’ve never had it on and it only works for batch runs right? and doesn’t look at what’s in redshift vs what’s being run in ETL etc. we’d have to run all the logs since 6/14 as one big batch for it to work? and even then we’d still have dupes in redshift from previous good loads.

Topic		Replies	Views
Re-run the enrichment bad log For engineers	6	2819	August 18, 2016
Using Hadoop Event Recovery to recover events with a missing schema [tutorial] Troubleshooting	17	5441	June 1, 2017
Snowplow Event Recovery on GCP GCP pipeline	4	670	September 27, 2023
Running Hadoop Event Recovery with Dataflow Runner [tutorial] Troubleshooting	1	1847	October 12, 2017
R71: JSON validation in Scala Common Enrich Enrichment	3	1844	November 27, 2017

Reprocessing / Rerunning logs from IGLU server failure for unstructured events

Related topics