Event recovery with Scala Stream Collector

abrenaut · April 6, 2018, 6:38pm

Hi,

I’m trying to reprocess events in the enriched/bad buckets following this tutorial.

The Hadoop event recovery job runs fine but when I launch the EmrEtlRunner job using my recovered events buckets the EMR job fails at the “Elasticity S3DistCp Step: Enriched HDFS -> S3” step:
Error running job ... Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-172-20-57-52.us-east-2.compute.internal:8020/tmp/847d0f19-e2e7-4a86-8f22-2e2db445a993/files

I’m guessing this is because the enrich step expects files in the lzo format but instead the recovered events are base64 encoded in txt files.

Is there a way to make the event recovery job works with events collected using the Scala Stream Collector ?

Thanks,
Arthur

Topic		Replies	Views
Unable to recover bad events due to a missing schema Enrichment	3	1454	June 26, 2020
Resubmit record from stream enrich Enrichment	1	1645	September 1, 2020
EmrEtlRunner fails when copying enriched events to S3 Enrichment	1	1615	October 28, 2016
Enrich Raw Events fails due to "Not a file: hdfs" -- Clojure connector -- EMR ETL Runner Troubleshooting	11	1916	September 27, 2017
Error On EmrEtlRunner Enrichment	5	2625	May 21, 2020

Event recovery with Scala Stream Collector

Related topics