How to re-run a job that fails at the processing stage?

rajan · August 3, 2016, 11:40am

Hello,

I had a few problem in EMRETLRunner. Here is that link to the problem. I got the solution for that problem. This job got failed in the processing stage. If i re ran this job, will all the logs will be moved to the redshift. I have few more raw logs. Will that raw logs processed during this re-run. I want all the logs to be moved to the redshift. In what direction i have to look into it. What are the step i have to follow to achieve this ?.

All pointers to solution much appreciated.

ihor · August 3, 2016, 10:36pm

Hi @rajan,

Please, review the diagram and instructions on Batch Pipeline Steps wiki page.

In short, if you have any files in your processing bucket (due to failure at that stage), you would need to rerun the job with --skip staging option.

Once those files have been processed successfully (including loading to Redshift), you can run the job again as usual (without --skip option) to pick up the remaining “raw” files.

The point is the job will not start from the very beginning if there are files in any of the following buckets:

processing (cleared by EMR job)
enriched/good (cleared by StorageLoader)
shredded/good (cleared by StorageLoader)

They are all cleared during EMR and StorageLoader job runs. Any remaining files would indicate a job failure at some stage.

Again, please, follow the instructions on the wiki page mentioned above. It shows you how to rerun the job depending on the failed step.

–Ihor

rajan · August 4, 2016, 1:45pm

Thank you for support. I was finally able to get ETLEMR running successfully.

Topic		Replies	Views
Processing logs for a specific time period AWS batch pipeline (Legacy)	5	1518	November 14, 2016
Processing folder not empty - but no error on the ETL script! Enrichment	4	1344	June 14, 2017
How to reimport data for some data For engineers	4	736	March 31, 2020
Monitoring for failed ETL jobs (batch pipeline) AWS batch pipeline (Legacy)	4	2195	August 8, 2016
EmrEtlRunner::EmrExecutionError in the 3rd stage of the process AWS batch pipeline (Legacy)	4	2298	October 23, 2017

How to re-run a job that fails at the processing stage?

Related topics