Reading back through the thread our solution to this was to better configure our cluster. I recommend reading through this thread Learnings from using the new Spark EMR Jobs, particularly the references to the source spreadsheets.
You can also try setting the max retries to 1 and you will probably stop masking the true error.