Hello! I’m having some trouble using dataflow runner to run the snowflake loader. Here’s the command I’m running:
./dataflow-runner run-transient --emr-config ./snowflake/cluster.json --emr-playbook ./snowflake/playbook.json
And here’s the error it returns:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x8b2386]
goroutine 22 [running]:
github.com/aws/aws-sdk-go/aws/ec2metadata.(*EC2Metadata).GetMetadataWithContext(0x0, 0xe701e0, 0xc0003046a0, 0xc43f4d, 0x19, 0xb0, 0x400, 0x400, 0x40d7c6)
/home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/aws/ec2metadata/api.go:69 +0x146
github.com/aws/aws-sdk-go/aws/credentials/ec2rolecreds.requestCredList(0xe701e0, 0xc0003046a0, 0x0, 0xc000041ca0, 0x40ac6d, 0xb986a0, 0x4b6a20, 0x8b7a20)
/home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/aws/credentials/ec2rolecreds/ec2_role_provider.go:142 +0x72
github.com/aws/aws-sdk-go/aws/credentials/ec2rolecreds.(*EC2RoleProvider).RetrieveWithContext(0xc0002d8ae0, 0xe701e0, 0xc0003046a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/aws/credentials/ec2rolecreds/ec2_role_provider.go:98 +0x84
github.com/aws/aws-sdk-go/aws/credentials.(*Credentials).singleRetrieve(0xc0002d8b10, 0xe701e0, 0xc0003046a0, 0x0, 0x0, 0x0, 0x0)
/home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/aws/credentials/credentials.go:261 +0x2d4
github.com/aws/aws-sdk-go/aws/credentials.(*Credentials).GetWithContext.func1(0x0, 0x0, 0x0, 0x0)
/home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/aws/credentials/credentials.go:244 +0x8a
github.com/aws/aws-sdk-go/internal/sync/singleflight.(*Group).doCall(0xc0002d8b20, 0xc000322060, 0x0, 0x0, 0xc0002d14a0)
/home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/internal/sync/singleflight/singleflight.go:97 +0x2e created by github.com/aws/aws-sdk-go/internal/sync/singleflight.(*Group).DoChan
/home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/internal/sync/singleflight/singleflight.go:90 +0x2b4
For reference, this is my cluster.json file:
{
"schema":"iglu:com.snowplowanalytics.dataflowrunner/ClusterConfig/avro/1-1-0",
"data":{
"name":"dataflow-runner - snowflake transformer",
"logUri":"s3://path-to-my-logs/",
"region":"us-east-1",
"credentials":{
"accessKeyId":"iam",
"secretAccessKey":"iam"
},
"roles":{
"jobflow":"EMR_EC2_DefaultRole",
"service":"EMR_DefaultRole"
},
"ec2":{
"amiVersion":"5.9.0",
"keyName":"snowflake-loader",
"location":{
"vpc":{
"subnetId":null
}
},
"instances":{
"master":{
"type":"m2.xlarge"
},
"core":{
"type":"m2.xlarge",
"count":1
},
"task":{
"type":"m1.medium",
"count":0,
"bid":"0.015"
}
}
},
"tags":[ ],
"bootstrapActionConfigs":[ ],
"configurations":[
{
"classification":"core-site",
"properties":{
"Io.file.buffer.size":"65536"
}
},
{
"classification":"mapred-site",
"properties":{
"Mapreduce.user.classpath.first":"true"
}
},
{
"classification":"yarn-site",
"properties":{
"yarn.resourcemanager.am.max-attempts":"1"
}
},
{
"classification":"spark",
"properties":{
"maximizeResourceAllocation":"true"
}
}
],
"applications":[ "Hadoop", "Spark" ]
}
}
And this is my playbook.json file:
{
"schema":"iglu:com.snowplowanalytics.dataflowrunner/PlaybookConfig/avro/1-0-1",
"data":{
"region":"us-east-1",
"credentials":{
"accessKeyId":"iam",
"secretAccessKey":"iam"
},
"steps":[
{
"type":"CUSTOM_JAR",
"name":"Snowflake Transformer",
"actionOnFailure":"CANCEL_AND_WAIT",
"jar":"command-runner.jar",
"arguments":[
"spark-submit",
"--conf",
"spark.hadoop.mapreduce.job.outputformat.class=com.snowplowanalytics.snowflake.transformer.S3OutputFormat",
"--deploy-mode",
"cluster",
"--class",
"com.snowplowanalytics.snowflake.transformer.Main",
"s3://snowplow-hosted-assets/4-storage/snowflake-loader/snowplow-snowflake-transformer-0.8.0.jar",
"--config",
"{{base64File "./snowflake/snowflake-loader-config.json"}}",
"--resolver",
"{{base64File "./snowflake/iglu_resolver.json"}}",
"--events-manifest",
"{{base64File "./snowflake/events_manifest.json"}}"
]
},
{
"type":"CUSTOM_JAR",
"name":"Snowflake Loader",
"actionOnFailure":"CANCEL_AND_WAIT",
"jar":"s3://snowplow-hosted-assets/4-storage/snowflake-loader/snowplow-snowflake-loader-0.8.0.jar",
"arguments":[
"load",
"--base64",
"--config",
"{{base64File "./snowflake/snowflake-loader-config.json"}}",
"--resolver",
"{{base64File "./snowflake/iglu_resolver.json"}}"
]
}
],
"tags":[ ]
}
}
For additional context, I’ve already set up a snowflake storage integration and ran the setup CLI action for Snowflake loader. I also am already sinking events successfully to S3. I’m starting dataflow runner in an EC2 instance that has an IAM role that grants it EMR access.
Any thoughts on where I might be going wrong would be greatly appreciated!