Core_Instance_Count not increasing

Hi @sp_user,

It appears you are using EMR ETL Runner R117 which has an issue with core_instances and ebs volume (see: https://github.com/snowplow/snowplow/issues/4285). The issue has been fixed in the latest version.

As for the correct configuration for large data sets: you will need to specify additional configuration settings to utilize as much resources as possible. I would recommend to read this thread to get a sense on how this can be done. You may consider to use one of configurations provided in the thread (e.g. 1x m4.xlarge & 5x r4.8xlarge).

In overall it’s better to run the job more often and process less data. It should be more robust and cost efficient model.

Hope this helps.