Dataflow-runner and subnetId

Maxim_Volembovschii · November 10, 2021, 5:31pm

Hi there,

trying to deploy dataflow runner using this command below:

./dataflow-runner --log-level debug run-transient --emr-config cluster.json --emr-playbook playbook.json

cluster.config has pretty much default settings according to this setup guide Setup - Snowplow Docs

though I am getting the error below:

ERRO[0000] At least one of Availability Zone and Subnet id is required 
At least one of Availability Zone and Subnet id is required

cluster.config has following content

{
   "schema":"iglu:com.snowplowanalytics.dataflowrunner/ClusterConfig/avro/1-1-0",
   "data":{
      "name":"dataflow-runner - snowflake transformer",
      "logUri":"s3://snwplw-maxv-snowflake-storage-integration/logs/",
      "region":"us-east-1",
      "credentials":null,
      "roles":{
         "jobflow":"EMR_EC2_DefaultRole",
         "service":"EMR_DefaultRole"
      },
      "ec2":{
         "amiVersion":"5.9.0",
         "keyName":"Q4WEB-TST-VA.pem",
         "location":{
            "vpc":{
               "subnetId":"subnet-0785479c0a323daeb"
            }
         },
         "instances":{
            "master":{
               "type":"m2.xlarge"
            },
            "core":{
               "type":"m2.xlarge",
               "count":1
            },
            "task":{
               "type":"m1.medium",
               "count":0,
               "bid":"0.015"
            }
         }
      },
      "tags":[ ],
      "bootstrapActionConfigs":[ ],
      "configurations":[
         {
            "classification":"core-site",
            "properties":{
               "Io.file.buffer.size":"65536"
            }
         },
         {
            "classification":"mapred-site",
            "properties":{
               "Mapreduce.user.classpath.first":"true"
            }
         },
         {
            "classification":"yarn-site",
            "properties":{
               "yarn.resourcemanager.am.max-attempts":"1"
            }
         },
         {
            "classification":"spark",
            "properties":{
               "maximizeResourceAllocation":"true"
            }
         }
      ],
      "applications":[ "Hadoop", "Spark" ]
   }
}

not sure why I am getting this error.

Thanks

pramod.niralakeri · January 10, 2022, 11:59am

setup to

    "credentials": {
      "accessKeyId": "xyz",
      "secretAccessKey": "abc"
    }

Topic		Replies	Views
Dataflow Runner run-transient not working For engineers	6	819	February 17, 2022
Validation error on dataflow runner up	12	1284	October 18, 2021
Application configuration with dataflow-runner Troubleshooting	3	1422	December 22, 2017
Spark missing in Dataflow-runner Enrichment	25	3730	December 10, 2020
RDB Shredder step fails in Dataflow Runner AWS real-time pipeline	4	1134	May 19, 2021

Dataflow-runner and subnetId

Related topics