EtlEmrRunner fails at step [enrich] spark: Enriched HDFS -> S3 with error "Input path does not exist: hdfs://ip-172-31-10-133.us-east-2.compute.internal:8020/tmp/1d59dbe9-98f2-473f-8d65-288f5019fdca/files"

ihor · August 9, 2020, 8:46pm

@ckrishnamoorthy, according to another post of your, you are using Amazon Kinesis Firehose to upload the data to S3. I suspect your data, therefore, is not in the right format/location. I also do not follow that architecture shown below if that is the case here as well.

Kinesis ScalaCollector -> EnrichEmrEtlrunner -> Amazon Kinesis Firehose -> S3 Enriched records -> PostgresqlLoader -> PostgresqlDB

Did you mean to show “Firehose” before EmrEtlRunner (EER)?

To upload the streamed data to S3 you would have to use the dedicated application to work with EmrEtlRunner - Kinesis S3 Loader.

Your EER configuration file is very hard to read. To retain the indentation, could you place your YAML or other code in between pairs of ``` (triple tick - Markdown).

Topic		Replies	Views
Error On EmrEtlRunner Enrichment	5	2623	May 21, 2020
Error in "Elasticity S3DistCp Step: Raw S3 -> HDFS AWS batch pipeline (Legacy)	2	2622	July 20, 2016
EmrEtlRunner fails at Hadoop Shred step Storage targets	5	1621	May 22, 2020
EmrEtlRunner fails when copying enriched events to S3 Enrichment	1	1615	October 28, 2016
Enrich Raw Events fails due to "Not a file: hdfs" -- Clojure connector -- EMR ETL Runner Troubleshooting	11	1916	September 27, 2017

EtlEmrRunner fails at step [enrich] spark: Enriched HDFS -> S3 with error "Input path does not exist: hdfs://ip-172-31-10-133.us-east-2.compute.internal:8020/tmp/1d59dbe9-98f2-473f-8d65-288f5019fdca/files"

Related topics