My last EMR cluster failed at the rdb_load step with the following logs:
15:13:01.199: Consistency check failed. Making another attempt
15:13:11.306: Consistency check failed. Making another attempt
15:13:21.442: Consistency check failed. Making another attempt
15:13:31.529: Consistency check did not pass after 5 attempts
15:13:31.538: Data discovery error with following issues:
Folder with atomic-events was not found in [s3:/bucket/shredded/good/run=2021-05-06-12-00-21/]
I couldn’t find a solution about this error.
I tried running a new job from rdb_load but it failed with the same error.
@guillaume, I suspect you have lots of “empty” (0 bytes) directories and/or files in s3:/bucket/shredded/good/ location. You need to delete them to allow the app to see your data in run=2021-05-06-12-00-21 folder (provided you do have data in that folder). It’s a good idea to do a periodical clean-up to prevent this error.
@guillaume, it looks good if no other files and folders in s3:/bucket/shredded/good/. It could be an infamous AWS eventual consistency issue when the status of the bucket cannot be seen as is.
You could try resuming with --skip consistency_check.