Hello,
I have Snowplow batch running on AWS (scala-collector > s3-loader > EmrEtlRunner).
It was running fine for the past few weeks but lately I’ve been getting a lot of failures during the raw staging S3 step.
The step fails with the following trace in stderr
Error: java.lang.RuntimeException: Reducer task failed to copy 2275 files: s3://snowplow/raw/in/2018-10-24-49589377919602874491714939496115412362808439243580375074-49589377919602874491714939496115412362808439243580375074.lzo.index etc
at com.amazon.elasticmapreduce.s3distcp.CopyFilesReducer.cleanup(CopyFilesReducer.java:67)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:179)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:635)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
I have to manually move files from the raw/processing
folder to the raw/in
and re-run the job hoping that it won’t fail this time to fix it.
If I look at the container logs I can see the following error
2018-10-23 18:32:19,725 ERROR [s3distcp-simpler-executor-worker-1] com.amazon.elasticmapreduce.s3distcp.CopyFilesRunnable: Error downloading input files. Not marking as committed
java.io.FileNotFoundException: No such file or directory 's3://snowplow/raw/in/2018-10-23-49588889455877140086970628804200750496158524777810624562-49588889455877140086970628809616738168032063824091676722.lzo.index'
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:816)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.open(S3NativeFileSystem.java:1194)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:773)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.open(EmrFileSystem.java:166)
at com.amazon.elasticmapreduce.s3distcp.CopyFilesReducer.openInputStream(CopyFilesReducer.java:293)
at com.amazon.elasticmapreduce.s3distcp.CopyFilesRunnable.mergeAndCopyFiles(CopyFilesRunnable.java:102)
at com.amazon.elasticmapreduce.s3distcp.CopyFilesRunnable.run(CopyFilesRunnable.java:35)
at com.amazon.elasticmapreduce.s3distcp.SimpleExecutor$Worker.run(SimpleExecutor.java:49)
at java.lang.Thread.run(Thread.java:748)
When the file 2018-10-23-49588889455877140086970628804200750496158524777810624562-49588889455877140086970628809616738168032063824091676722.lzo.index
actually exists.
Any idea if there is something wrong with the EmrEtlRunner or if it’s an issue with s3DistCp? And how could this be potentially solved?
ami_version: 5.9.0
rdb_loader: 0.14.0
rdb_shredder: 0.13.1
spark_enrich: 1.16.0
S3 bucket encryption turned on
Thank you!
Arthur