Enriched HDFS -> S3 step intermittent failure

kazgurs1 · February 13, 2017, 11:02am

There were a few times where enriched events were not copied to S3, due to this error:

Exception in thread "main" java.io.IOException: Error opening job jar: /usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar
at org.apache.hadoop.util.RunJar.run(RunJar.java:160)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.util.zip.ZipException: zip file is empty
at java.util.zip.ZipFile.open(Native Method)

Re-running the etl-runner without staging fixes the issue. I found troubleshooting tips here:

I guess it’s difficult to settle such issues easily, but I’m sure most users are using some task scheduler like Jenkins for their snowplow pipeline, so it’s not ideal to be re-running manually as the pipeline is not self-healing. Anyone else had this? Any ideas?

Here’s my config:

  emr:
    ami_version: 4.5.0
    region: eu-central-1
    jobflow_role: EMR_EC2_DefaultRole
    service_role: EMR_DefaultRole
    placement:
    ec2_subnet_id: subnet-[...]
    ec2_key_name: my_key
    bootstrap: []
    software:
      hbase:
      lingual:

    jobflow:
      master_instance_type: m4.large
      core_instance_count: 3
      core_instance_type: c3.4xlarge
      task_instance_count: 0
      task_instance_type: c4.large
      task_instance_bid:
    bootstrap_failure_tries: 3
    additional_info:
collectors:
  format: clj-tomcat
enrich:
  job_name: snowplow ETL
  versions:
    hadoop_enrich: 1.8.0
    hadoop_shred: 0.10.0
    hadoop_elasticsearch: 0.1.0
  continue_on_unexpected_error: false
  output_compression: GZIP

Topic		Replies	Views
EMR failing : Enriched HDFS -> S3: FAILED Troubleshooting	4	2007	April 11, 2017
Emr etl runner fails without useful error on step "Elasticity Spark Step: Enrich Raw Events" Troubleshooting	3	3297	July 25, 2018
Elasticity Scalding Step: Enrich Raw Events fails Enrichment	2	1724	July 21, 2016
EmrEtlRunner fails when copying enriched events to S3 Enrichment	1	1615	October 28, 2016
Enrich Raw Events fails due to "Not a file: hdfs" -- Clojure connector -- EMR ETL Runner Troubleshooting	11	1916	September 27, 2017

Enriched HDFS -> S3 step intermittent failure

Related topics