Hi Asaf,
When the EMR process runs, any data that cannot be successfully processed will be written to the ‘bad’ bucket. It is normal to have several of these lines generated with each run. We have guides to using these bad rows to debugging upstream issues here, here and here.
However, this process is non-blocking. The reason your EMR job is failing will not be related to the bad rows.
When the EMR job fails you should get an error message back from EmrEtlRunner. In addition it should be possible to look in the AWS EMR console and see what an error message there.
If you can share with us those error messages we’ll be in a better position to help you diagnose the route cause of the failures.
All the best,
Yali