Shred step failure, no error message

wleftwich · June 1, 2021, 5:22pm

Hi,

With snowplow-emr-etl-runner-r117, our ETL job is failing at the “[shred] spark: Shred Enriched Events” step. Lots of *.gz files are left in S3 enriched/good/run=2021-06-01-08-30-14/stream/ .

stderr for the failed step doesn’t offer much of a clue:

21/06/01 16:07:05 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.0.1.79
ApplicationMaster RPC port: 0
queue: default
start time: 1622563598944
final status: FAILED
tracking URL: http://ip-10-0-1-129.ec2.internal:20888/proxy/application_1622563416514_0002/
user: hadoop
Exception in thread “main” org.apache.spark.SparkException: Application application_1622563416514_0002 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1104)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/06/01 16:07:05 INFO ShutdownHookManager: Shutdown hook called

When I try to restart the job with “-f shred” I get the same error.

I am trying to troubleshoot this service that was installed by Someone Who Is No Longer With The Company, so am really groping here.

Is there another place I should be looking for more informative error logs?

Any advice appreciated.

ihor · June 1, 2021, 5:39pm

@wleftwich, where the final folder “stream” (in enriched/good/run=2021-06-01-08-30-14/stream/) comes from? No folders are expected after the “run=” folder.

Also, if it fails again (after some time of shredding), perhaps you need to consider bumping EMR cluster as the data volume could be too high for the current EMR cluster configuration.

wleftwich · June 1, 2021, 5:57pm

Hi @ihor, thanks for the fast reply.

I don’t know where the /stream comes from – it is not in the emr-etl-runner config.yml:
s3:
buckets:
assets: s3://snowplow-hosted-assets
encrypted: false
enriched: {archive: ‘s3://rr-snowplow-events-e2-prod/enriched/archive’, bad: ‘s3://rr-snowplow-events-e2-prod/enriched/bad’,
errors: null, good: ‘s3://rr-snowplow-events-e2-prod/enriched/good’, stream: ‘s3://rr-snowplow-enriched-stream-e2-prod’}
jsonpath_assets: s3://rr-snowplow-cloudfront-iglu-central/jsonpaths/
log: s3://rr-snowplow-events-e2-prod/emr_logs
shredded: {archive: ‘s3://rr-snowplow-events-e2-prod/shredded/archive’, bad: ‘s3://rr-snowplow-events-e2-prod/shredded/bad’,
errors: null, good: ‘s3://rr-snowplow-events-e2-prod/shredded/good’}
consolidate_shredded_output: false
region: <%= ENV[‘RR_SNOWPLOW_REGION’] %>

Maybe it is coming from the enriched.stream folder?

~ $ aws s3 ls s3://rr-snowplow-enriched-stream-e2-prod/
PRE stream/
2020-09-03 06:51:55 0 stream_$folder$

At any rate I will try just moving all the *.gz up a level.`

wleftwich · June 1, 2021, 8:42pm

Thanks again @ihor. I followed both your suggestions and got back in business.

– Wade

ihor · June 1, 2021, 9:15pm

Hey @wleftwich, glad to hear that.

Maybe it is coming from the enriched.stream folder?

It shouldn’t unless your S3 Loader is configured to upload the streamed data to that folders. At staging step the files are simply moved from enriched:stream to enriched:good location as per this dataflow diagram.

Topic		Replies	Views
Elasticity Spark Step: Shred Enriched Events: consistent failure without clear reason Storage targets	2	2366	November 11, 2017
Snowplow ETL runner failing at the step: [enrich] spark: Enrich Raw Events with spark exception Enrichment	6	2306	August 8, 2019
Snowplow etl emr issue	0	1006	September 25, 2019
Shred step just started failing (R97) AWS batch pipeline (Legacy)	1	1139	March 19, 2019
Emr etl runner fails without useful error on step "Elasticity Spark Step: Enrich Raw Events" Troubleshooting	3	3295	July 25, 2018

Shred step failure, no error message

Related topics