I have verified that the input bucket is not empty. If I use the option --skip staging the behavior is the same but I get D, [2017-06-08T15:26:33.452000 #23977] DEBUG -- : Initializing EMR jobflow
instead of D, [2017-06-08T15:23:57.149000 #23957] DEBUG -- : Staging raw logs...
I’d be happy to provide more information if it’s useful.
I suspect your “processing” bucket is not empty. Thus, the runner assumes the previous job hasn’t completed yet. It also possible the previous run failed and left the files in the bucket(s) unarchived.
As ihor said, it is probably due to the script considering that your enriched bucket is not empty.
You can check the exit code of the script and check to which error it corresponds in
I think we could improve on the error reporting here
Hi all, thanks for your replies. There were a couple of issues, some of which are now resolved. It was indeed the case that my enriched folder was not empty, and the EmrEtlRunner functioned after I cleared it manually.
I’m not sure where to view the exit code you’re referring to. When I run the script, it hangs after “GET request … finished with status code 200” and does NOT archive the results of the enrich or shred process. As a result, I’ve had to manually move files from shredded/good to shredded/archive to get the runner to work each time.