Hello, I am getting a failure in the Elasticity S3DistCp Step: Raw S3 -> Raw HDFS step.
This doesn’t occur during every run, but has occurred during my latest run. The problem appears to be when attempting to copy files from S3 to HDFS it attempts to copy a file that doesn’t exist.
Here is the error:
Error: java.lang.RuntimeException: Reducer task failed to copy 272 files: s3://production-snowplow-processing-data/processing/EWBPWVW3GFOLK.2018-09-13-11.4d9fdf5f.gz etc
at java.security.AccessController.doPrivileged(Native Method)
I checked with
aws s3api get-object-acl --bucket production-snowplow-processing-data --key /processing/EWBPWVW3GFOLK.2018-09-13-11.4d9fdf5f.gz
With a response of >An error occurred (NoSuchKey) when calling the GetObjectAcl operation: The specified key does not exist.
@frankcash, I’m afraid we do not have a solution for this at the moment (eventual consistency is part of the AWS S3 service and it’s meant to serve as a reliability means with such side-effect as consequence). We do experience it from time to time but it normally occurs at data load step and not at staging step. In future, the solution could be keeping track of all the files processed and skipping attempting to process the file that appears to be still present nonetheless but yet recorded as processed (again this is related to eventual consistency during data load).