Scheduling EMR ETL and sql-runner

trung · April 17, 2019, 1:42am

It seems like tools like factotum can help scheduling the EMR ETL runner and sql-runner.

However the problem is the EMR ETL runner is it’s asynchronous. It fires and the EMR cluster starts up and does it’s thing.

How can i schedule the sql runner to run after the EMR cluster completes it’s work?

grzegorzewald · April 17, 2019, 6:24am

Hi @trung,

I have been facing with the same problem in recent weeks. As my implementation is AWS based, I use AWS StepFunction (particularly State machine) for schedule. In my case both (ETL EMR Runner and SQL runner) run in containers (one classic, one Fargate). In both cases, main processes return exit code based on actual state (so 0 if everything is OK). Based on this state, I do following steps and send error notifications to myself and other interests.

Additionally I have tiny lambda function helping me to fire some SQL on weekly and monthly basis.

Inf you need more details, feel free to drop a line.

Cheers,
GE

Colm · April 17, 2019, 10:41am

How can i schedule the sql runner to run after the EMR cluster completes it’s work?

You could use a single DAG for both and have the SQL-runner job as the second step which depends on the first.

Topic		Replies	Views
Scheduling EmrEtlRunner and StorageLoader Enrichment	2	1264	April 12, 2016
Cron Job for emr-etl and snowflake data Enrichment	4	1547	March 19, 2020
Dataflow-runner - EMR cluster not terminated after completion Enrichment	7	2066	June 1, 2020
Dataflow Runner released New releases	2	1576	February 11, 2017
Dataflow Runner setup For engineers	3	932	February 11, 2022

Scheduling EMR ETL and sql-runner

Related topics