when we switched to a larger node type, we got error from the last step in shredding Elasticity S3DistCp Step: Shredded HDFS -> S3:
Error: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down; Request ID: 5A2F87935C17C792), S3 Extended Request ID: 6YcZaPRh5xyaWrQUz9KDpRyKhiGt59QcWVIXNvsOxk1oNRegZX6CgEN1974w1c0eIN35YgzTe/I=
That is caused by a lot of data is being pushed to S3 aggressively (according to AWS). The ways to mitigate is either reset “–targetSize=SIZE” to a large size or engage EMRFS consistency http://docs.aws.amazon.com/emr/latest/ManagementGuide/emrfs-configure-consistent-view.html.
Can we modify the config.yml to implement the above suggestions given we are using snowplow-emr-etl-runner? What is a good way to do it?
Thanks,
Richard