Shred problems using Batch

mjensen · November 18, 2020, 3:45pm

I’ve struggled with this issue for 2 days now for our Dev instance of batch snow plow. I’ve tried all the tricks i’ve done in the past to fix enrichment or shred issues but nothing is working. i’ve been using snow plow for 4 years. there’s nothing in the EMR logs that tell me what the issue is. i’ve tried the following so far:

Cleaned out good/bad/archive folders for shredded folders in s3. this is sometimes a problem if there are too many files in each folder for snow plow. especially since we are not on s3a yet.
We are behind on updates but production is working fine so far. it’s just dev. EMR ETL runner version is below. prod and dev are in sync version wise
tried to increase number of cores to use in case there’s just a lot of data it has to shred but same thing.
i’ve gone through the troubleshooting doc for step issues, tried it all.

./snowplow-emr-etl-runner --version
uri:classloader:/gems/avro-1.8.1/lib/avro/schema.rb:350: warning: constant ::Fixnum is deprecated
snowplow-emr-etl-runner 0.33.1

Snowplow::EmrEtlRunner::EmrExecutionError (EMR jobflow j-26AORV9NANGGB failed, check Amazon EMR console and Hadoop logs for details (help: https://github.com/snowplow/snowplow/wiki/Troubleshooting-jobs-on-Elastic-MapReduce). Data files not archived.
Snowplow DEV ETL: TERMINATING [STEP_FAILURE] ~ elapsed time n/a [2020-11-18 15:20:20 +0000 - ]
 - 1. Elasticity S3DistCp Step: Enriched S3 -> HDFS: COMPLETED ~ 00:00:50 [2020-11-18 15:20:22 +0000 - 2020-11-18 15:21:13 +0000]
 - 2. Elasticity Spark Step: Shred Enriched Events: FAILED ~ 00:12:10 [2020-11-18 15:21:13 +0000 - 2020-11-18 15:33:23 +0000]
 - 3. Elasticity S3DistCp Step: Shredded S3 -> Shredded Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 4. Elasticity S3DistCp Step: Enriched S3 -> S3 Enriched Archive: CANCELLED ~ elapsed time n/a [ - ]
 - 5. Elasticity Custom Jar Step: Load Data Warehouse Storage Target: CANCELLED ~ elapsed time n/a [ - ]
 - 6. Elasticity S3DistCp Step: Raw Staging S3 -> Raw Archive S3: CANCELLED ~ elapsed time n/a [ - ]
 - 7. Elasticity S3DistCp Step: Shredded HDFS _SUCCESS -> S3: CANCELLED ~ elapsed time n/a [ - ]
 - 8. Elasticity S3DistCp Step: Shredded HDFS -> S3: CANCELLED ~ elapsed time n/a [ - ]):
    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:691:in `run'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:138:in `run'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
    org/jruby/RubyKernel.java:994:in `load'
    uri:classloader:/META-INF/main.rb:1:in `<main>'
    org/jruby/RubyKernel.java:970:in `require'
    uri:classloader:/META-INF/main.rb:1:in `(root)'
    uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

20/11/18 15:33:21 INFO Client: Application report for application_1605712699264_0002 (state: RUNNING)
20/11/18 15:33:22 INFO Client: Application report for application_1605712699264_0002 (state: FINISHED)
20/11/18 15:33:22 INFO Client: 
	 client token: N/A
	 diagnostics: User class threw exception: org.apache.spark.SparkException: Job aborted.
	 ApplicationMaster host: 172.30.1.222
	 ApplicationMaster RPC port: 0
	 queue: default
	 start time: 1605712888343
	 final status: FAILED
	 tracking URL: http://ip-172-30-15-104.ec2.internal:20888/proxy/application_1605712699264_0002/
	 user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1605712699264_0002 finished with failed status
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1104)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/11/18 15:33:22 INFO ShutdownHookManager: Shutdown hook called
20/11/18 15:33:22 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-01569491-7d76-4b63-a602-949479a43d51
Command exiting with ret '1'

anton · December 5, 2020, 3:49pm

Hi @mjensen,

Have you managed to solve the problem? One more advice I can give you to troubleshoot it is to look at YARN container logs. Somewhere in containers/application_1605712699264_0002/ in your EMR logs folder.

But as you’ve noticed your pipeline is quite behind latest version, if nothing useful is found in container logs I’d just recommend to upgrade the pipeline first.

Topic		Replies	Views
EMR failing : Enriched HDFS -> S3: FAILED Troubleshooting	4	2007	April 11, 2017
Snowplow::EmrEtlRunner::EmrExecutionError Enrichment	3	1202	April 25, 2019
Snowplow EMR jobflow error Troubleshooting	6	1675	January 9, 2018
Snowplow etl emr issue	0	1006	September 25, 2019
Cluster: Snowplow ETLTerminated with errorsShut down as step failed Duplicate	2	2455	October 10, 2017

Shred problems using Batch

Related topics