Dataflow-runner - EMR cluster not terminated after completion

Colm · May 22, 2020, 11:59am

Don’t apologise! We’re happy to see you’re engaged and asking for help!

The EMR cluster should shut down if it’s created as a transient cluster. Otherwise, it’ll be persistent and will just wait for the next job to run.

At volume, it can more efficient to run the load jobs on a persistent cluster, and load in a micro-batch style (ie kicking off a new job on the same cluster as soon as the last job finishes) - since there’s a cost to the time that the cluster takes to spin up and down again.

If you don’t need that, I believe you’ll just need to make a change in the config which creates the EMR cluster (if memory serves it’s part of your dataflow-runner configuration).

Here’s a similar thread on the topic, which might help: EMR ETL stream_enrich mode

Best,

Topic		Replies	Views
Snowflake Loader/Dataflow Runner using persistent cluster instead of new ones For engineers	5	717	September 12, 2020
Cron Job for emr-etl and snowflake data Enrichment	4	1546	March 19, 2020
Dataflow Runner released New releases	2	1574	February 11, 2017
Recommended/Supported EMR Versions? Enrichment	3	1197	March 31, 2021
Snowflake Loader - Process ran successfully but no data loaded Storage targets	12	3901	May 29, 2019

Dataflow-runner - EMR cluster not terminated after completion

Related topics