Dataflow-runner - EMR cluster not terminated after completion

Hey @Ryan_Newsome,

Don’t apologise! We’re happy to see you’re engaged and asking for help!

The EMR cluster should shut down if it’s created as a transient cluster. Otherwise, it’ll be persistent and will just wait for the next job to run.

At volume, it can more efficient to run the load jobs on a persistent cluster, and load in a micro-batch style (ie kicking off a new job on the same cluster as soon as the last job finishes) - since there’s a cost to the time that the cluster takes to spin up and down again.

If you don’t need that, I believe you’ll just need to make a change in the config which creates the EMR cluster (if memory serves it’s part of your dataflow-runner configuration).

Here’s a similar thread on the topic, which might help: EMR ETL stream_enrich mode

Best,