@Louai_Ghalia, looks like @pkutaj provided you with an example for EmrEtlRunner, not the Snowflake Transformer/Loader. If you are transforming ~10GB of compressed enriched data you need more powerful EMR cluster than what you use. However, taken that with the more powerful cluster you will not accumulate as much data any more I think you should do with the configuration as bellow to start with and then dropping it further down. Note that your Spark is not configured properly either which would underutilize the resources of your EMR cluster.
Here the example utilizing generation 4 EC2 types (that’s what you use) but you could do better with generation 5 and bumped AMI version (depending on the version of your Snowflake Transformer).
{
"instances": {
"master": {
"type": "m4.xlarge"
},
"core": {
"type": "r4.8xlarge",
"count": 3,
"ebsConfiguration": {
"ebsOptimized": true,
"ebsBlockDeviceConfigs": [
{
"volumesPerInstance": 1,
"volumeSpecification": {
"iops": 1500,
"sizeInGB": 320,
"volumeType": "io1"
}
}
]
}
},
"task": {
"type": "m4.large",
"count": 0,
"bid": "0.015"
}
},
"configurations": [
{
"classification": "yarn-site",
"properties": {
"yarn.nodemanager.vmem-check-enabled": "false",
"yarn.nodemanager.resource.memory-mb": "245760",
"yarn.scheduler.maximum-allocation-mb": "245760"
}
},
{
"classification": "spark",
"properties": {
"maximizeResourceAllocation": "false"
}
},
{
"classification": "spark-defaults",
"properties": {
"spark.dynamicAllocationEnabled": "false",
"spark.executor.instances": "44",
"spark.yarn.executor.memoryOverhead": "3072",
"spark.executor.memory": "13G",
"spark.executor.cores": "2",
"spark.yarn.driver.memoryOverhead": "3072",
"spark.driver.memory": "13G",
"spark.driver.cores": "2",
"spark.default.parallelism": "352"
}
}
]
}
Notice the Spark configuration that comes along. When spinning EMR cluster you might want to follow this guide to ensure your cluster’s resources are used to its fullest.
As a further example for your current 1x r4.2xlarge
cluster the Spark configuration would be
{
"configurations": [
{
"classification": "yarn-site",
"properties": {
"yarn.nodemanager.vmem-check-enabled": "false",
"yarn.nodemanager.resource.memory-mb": "57344",
"yarn.scheduler.maximum-allocation-mb": "57344"
}
},
{
"classification": "spark",
"properties": {
"maximizeResourceAllocation": "false"
}
},
{
"classification": "spark-defaults",
"properties": {
"spark.dynamicAllocationEnabled": "false",
"spark.executor.instances": "6",
"spark.yarn.executor.memoryOverhead": "1024",
"spark.executor.memory": "7G",
"spark.executor.cores": "1",
"spark.yarn.driver.memoryOverhead": "1024",
"spark.driver.memory": "7G",
"spark.driver.cores": "1",
"spark.default.parallelism": "24"
}
}
]
}