Issue using dataflow runner to run snowflake loader

samurijv2 · May 5, 2021, 4:03pm

Hello! I’m having some trouble using dataflow runner to run the snowflake loader. Here’s the command I’m running:

./dataflow-runner run-transient --emr-config ./snowflake/cluster.json --emr-playbook ./snowflake/playbook.json

And here’s the error it returns:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x8b2386]

goroutine 22 [running]:
github.com/aws/aws-sdk-go/aws/ec2metadata.(*EC2Metadata).GetMetadataWithContext(0x0, 0xe701e0, 0xc0003046a0, 0xc43f4d, 0x19, 0xb0, 0x400, 0x400, 0x40d7c6)
        /home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/aws/ec2metadata/api.go:69 +0x146
github.com/aws/aws-sdk-go/aws/credentials/ec2rolecreds.requestCredList(0xe701e0, 0xc0003046a0, 0x0, 0xc000041ca0, 0x40ac6d, 0xb986a0, 0x4b6a20, 0x8b7a20)
        /home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/aws/credentials/ec2rolecreds/ec2_role_provider.go:142 +0x72
github.com/aws/aws-sdk-go/aws/credentials/ec2rolecreds.(*EC2RoleProvider).RetrieveWithContext(0xc0002d8ae0, 0xe701e0, 0xc0003046a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/aws/credentials/ec2rolecreds/ec2_role_provider.go:98 +0x84
github.com/aws/aws-sdk-go/aws/credentials.(*Credentials).singleRetrieve(0xc0002d8b10, 0xe701e0, 0xc0003046a0, 0x0, 0x0, 0x0, 0x0)
        /home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/aws/credentials/credentials.go:261 +0x2d4
github.com/aws/aws-sdk-go/aws/credentials.(*Credentials).GetWithContext.func1(0x0, 0x0, 0x0, 0x0)
        /home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/aws/credentials/credentials.go:244 +0x8a
github.com/aws/aws-sdk-go/internal/sync/singleflight.(*Group).doCall(0xc0002d8b20, 0xc000322060, 0x0, 0x0, 0xc0002d14a0)
        /home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/internal/sync/singleflight/singleflight.go:97 +0x2e created by github.com/aws/aws-sdk-go/internal/sync/singleflight.(*Group).DoChan
        /home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/internal/sync/singleflight/singleflight.go:90 +0x2b4

For reference, this is my cluster.json file:

{
   "schema":"iglu:com.snowplowanalytics.dataflowrunner/ClusterConfig/avro/1-1-0",
   "data":{
      "name":"dataflow-runner - snowflake transformer",
      "logUri":"s3://path-to-my-logs/",
      "region":"us-east-1",
      "credentials":{
         "accessKeyId":"iam",
         "secretAccessKey":"iam"
      },
      "roles":{
         "jobflow":"EMR_EC2_DefaultRole",
         "service":"EMR_DefaultRole"
      },
      "ec2":{
         "amiVersion":"5.9.0",
         "keyName":"snowflake-loader",
         "location":{
            "vpc":{
               "subnetId":null
            }
         },
         "instances":{
            "master":{
               "type":"m2.xlarge"
            },
            "core":{
               "type":"m2.xlarge",
               "count":1
            },
            "task":{
               "type":"m1.medium",
               "count":0,
               "bid":"0.015"
            }
         }
      },
      "tags":[ ],
      "bootstrapActionConfigs":[ ],
      "configurations":[
         {
            "classification":"core-site",
            "properties":{
               "Io.file.buffer.size":"65536"
            }
         },
         {
            "classification":"mapred-site",
            "properties":{
               "Mapreduce.user.classpath.first":"true"
            }
         },
         {
            "classification":"yarn-site",
            "properties":{
               "yarn.resourcemanager.am.max-attempts":"1"
            }
         },
         {
            "classification":"spark",
            "properties":{
               "maximizeResourceAllocation":"true"
            }
         }
      ],
      "applications":[ "Hadoop", "Spark" ]
   }
}

And this is my playbook.json file:

{
   "schema":"iglu:com.snowplowanalytics.dataflowrunner/PlaybookConfig/avro/1-0-1",
   "data":{
      "region":"us-east-1",
      "credentials":{
         "accessKeyId":"iam",
         "secretAccessKey":"iam"
      },
      "steps":[
         {
            "type":"CUSTOM_JAR",
            "name":"Snowflake Transformer",
            "actionOnFailure":"CANCEL_AND_WAIT",
            "jar":"command-runner.jar",
            "arguments":[
               "spark-submit",
               "--conf",
               "spark.hadoop.mapreduce.job.outputformat.class=com.snowplowanalytics.snowflake.transformer.S3OutputFormat",
               "--deploy-mode",
               "cluster",
               "--class",
               "com.snowplowanalytics.snowflake.transformer.Main",

               "s3://snowplow-hosted-assets/4-storage/snowflake-loader/snowplow-snowflake-transformer-0.8.0.jar",

               "--config",
			   "{{base64File "./snowflake/snowflake-loader-config.json"}}",
               "--resolver",
               "{{base64File "./snowflake/iglu_resolver.json"}}",
               "--events-manifest",
               "{{base64File "./snowflake/events_manifest.json"}}"
            ]
         },

         {
            "type":"CUSTOM_JAR",
            "name":"Snowflake Loader",
            "actionOnFailure":"CANCEL_AND_WAIT",
            "jar":"s3://snowplow-hosted-assets/4-storage/snowflake-loader/snowplow-snowflake-loader-0.8.0.jar",
            "arguments":[
               "load",
               "--base64",
               "--config",
			   "{{base64File "./snowflake/snowflake-loader-config.json"}}",
               "--resolver",
			   "{{base64File "./snowflake/iglu_resolver.json"}}"
            ]
         }
      ],
      "tags":[ ]
   }
}

For additional context, I’ve already set up a snowflake storage integration and ran the setup CLI action for Snowflake loader. I also am already sinking events successfully to S3. I’m starting dataflow runner in an EC2 instance that has an IAM role that grants it EMR access.

Any thoughts on where I might be going wrong would be greatly appreciated!

anton · May 5, 2021, 5:35pm

Hi @samurijv2,

I have never seen an error like that. It feels that it might have something to do with DF running retrieving AWS credentials. I’d give a try with hardcoded credentials instead of retrieving them from IAM, just to narrow down the scope.

Another suspecious detail - I think location.vpc.subnetId is required?

samurijv2 · May 6, 2021, 1:23pm

Thanks for the help. Hardcoding the credentials did resolve the error I shared above. I also added in a value for location.vpc.subnet but i’m not sure if it’s working. I’m not currently using a VPC so I just used the subnet ID for the EC2 instance i’m using to run dataflow-runner. Will that work? For reference, this is what I see after attempting to run the loader:

INFO[0000] Launching EMR cluster with name 'dataflow-runner - snowflake transformer'...
INFO[0000] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0030] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
ERRO[0060] EMR cluster failed to launch with state TERMINATED_WITH_ERRORS
EMR cluster failed to launch with state TERMINATED_WITH_ERRORS

anton · May 6, 2021, 1:42pm

Hi @samurijv2,

Not entirely sure it would work, but also not saying it won’t. I’d recommend to check what’s the error in AWS EMR Console - there must be a message on top of the top bar.

samurijv2 · May 6, 2021, 2:34pm

That helped, the EMR console revealed that I had a typo in my key name. I’ve now been able to launch my cluster and get it to the “waiting” state. However, once I got to that point I’m met with another error similar to the one in my original post:

INFO[0000] Launching EMR cluster with name 'dataflow-runner - snowflake transformer'...
INFO[0000] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0030] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0060] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0090] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0120] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0150] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0180] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0210] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0240] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0270] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0300] EMR cluster launched successfully; Jobflow ID: j-2CJN1PN819VT9
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x8b2386]

goroutine 103 [running]:
github.com/aws/aws-sdk-go/aws/ec2metadata.(*EC2Metadata).GetMetadataWithContext(0x0, 0xe701e0, 0xc000442890, 0xc43f4d, 0x19, 0x13b, 0xe71a20, 0xc000324f60, 0x0)
        /home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/aws/ec2metadata/api.go:69 +0x146
github.com/aws/aws-sdk-go/aws/credentials/ec2rolecreds.requestCredList(0xe701e0, 0xc000442890, 0x0, 0xc0000c1ca0, 0x40ac6d, 0xb986a0, 0x4b6a20, 0x8b7a20)
        /home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/aws/credentials/ec2rolecreds/ec2_role_provider.go:142 +0x72
github.com/aws/aws-sdk-go/aws/credentials/ec2rolecreds.(*EC2RoleProvider).RetrieveWithContext(0xc000297e30, 0xe701e0, 0xc000442890, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/aws/credentials/ec2rolecreds/ec2_role_provider.go:98 +0x84
github.com/aws/aws-sdk-go/aws/credentials.(*Credentials).singleRetrieve(0xc000297e60, 0xe701e0, 0xc000442890, 0x0, 0x0, 0xc000076240, 0xc36c60)
        /home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/aws/credentials/credentials.go:261 +0x2d4
github.com/aws/aws-sdk-go/aws/credentials.(*Credentials).GetWithContext.func1(0x0, 0x0, 0x0, 0xc0002562d0)
        /home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/aws/credentials/credentials.go:244 +0x8a
github.com/aws/aws-sdk-go/internal/sync/singleflight.(*Group).doCall(0xc000297e70, 0xc0003254a0, 0x0, 0x0, 0xc0004283c0)
        /home/travis/gopath/pkg/mod/github.com/aws/aws-sdk-go@v1.34.5/internal/sync/singleflight/singleflight.go:97 +0x2e created by github.com/aws/aws-sdk-go/internal/sync/singleflight.(*Group).DoChan

Any ideas what could be going on? Thanks for the help.

oguzhanunlu · May 7, 2021, 12:56pm

Hey @samurijv2 ,

Does your playbook config still contain iam? The last error you shared indicates that cluster was created however steps couldn’t get submitted to the cluster and authentication error indicates the usage of iam based on the stacktrace.

Both cluster creation (via cluster config) and step addition (via playbook config) resolves authentication separately (both config have their credentials section), this is why the initial and the last errors are same.

I’m not sure why your EC2 instance can’t reach metadata service via iam config but it is worth trying other authentication options to bypass this for now.

Please keep us posted!

samurijv2 · May 7, 2021, 4:29pm

Good catch! Indeed, I had only hardcoded my credentials in cluster.json. I did the same in playbook.json and events_manifest.json and that resolved the error. The EMR cluster will launch successfully now, but something is now causing the “Snowflake Transformer” step of the jobflow to fail, which in turn cancels the “Snowplow Loader” step:

INFO[0000] Launching EMR cluster with name 'dataflow-runner - snowflake transformer'...
INFO[0000] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0030] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0060] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0090] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0120] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0150] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0180] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0210] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0240] EMR cluster is in state STARTING - need state WAITING, checking again in 30 seconds...
INFO[0271] EMR cluster launched successfully; Jobflow ID: j-1I4FIFH03VFHI
INFO[0271] Successfully added 2 steps to the EMR cluster with jobflow id 'j-1I4FIFH03VFHI'...
ERRO[0361] Step 'Snowflake Transformer' with id 's-1JPE69F8BAL71' was FAILED
ERRO[0361] Step 'Snowflake Loader' with id 's-108NB5RZ9G7Y' was CANCELLED
INFO[0361] Terminating EMR cluster with jobflow id 'j-1I4FIFH03VFHI'...
INFO[0361] EMR cluster is in state TERMINATING - need state TERMINATED, checking again in 30 seconds...
INFO[0391] EMR cluster is in state TERMINATING - need state TERMINATED, checking again in 30 seconds...
INFO[0421] EMR cluster is in state TERMINATING - need state TERMINATED, checking again in 30 seconds...
INFO[0451] EMR cluster is in state TERMINATING - need state TERMINATED, checking again in 30 seconds...
INFO[0482] EMR cluster terminated successfully
ERRO[0482] Transient EMR run completed with errors
ERRO[0482] 2/2 steps failed to complete successfully
2/2 steps failed to complete successfully

I suspect this means there’s a problem in my playbook.json file, which you can see in the original post. Does anything look off in there?

oguzhanunlu · May 10, 2021, 9:21am

Hey @samurijv2 , I didn’t see anything wrong with your playbook config at a quick look.

Could you check EMR console and navigate to submitted cluster’s page and check steps tab? You should be able to see stderr, stdout among other things for each step submitted to the cluster.

Could you share stderr and stdout for transformer step so that it’ll be easier to understand what went wrong?

Thanks!

samurijv2 · May 11, 2021, 1:29pm

Sure, below is the stderr output for the snowplow transformer step:

Warning: Skip remote jar s3://snowplow-hosted-assets/4-storage/snowflake-loader/snowplow-snowflake-transformer-0.8.0.jar.
21/05/07 16:24:28 INFO RMProxy: Connecting to ResourceManager at ip-172-31-70-99.ec2.internal/172.31.70.99:8032
21/05/07 16:24:28 INFO Client: Requesting a new application from cluster with 1 NodeManagers
21/05/07 16:24:28 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
21/05/07 16:24:28 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
21/05/07 16:24:28 INFO Client: Setting up container launch context for our AM
21/05/07 16:24:28 INFO Client: Setting up the launch environment for our AM container
21/05/07 16:24:28 INFO Client: Preparing resources for our AM container
21/05/07 16:24:29 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
21/05/07 16:24:31 INFO Client: Uploading resource file:/mnt/tmp/spark-111fe5b0-ac56-4d42-8202-3b5937863dc0/__spark_libs__5687966012409845365.zip -> hdfs://ip-172-31-70-99.ec2.internal:8020/user/hadoop/.sparkStaging/application_1620404502967_0001/__spark_libs__5687966012409845365.zip
21/05/07 16:24:35 INFO Client: Uploading resource s3://snowplow-hosted-assets/4-storage/snowflake-loader/snowplow-snowflake-transformer-0.8.0.jar -> hdfs://ip-172-31-70-99.ec2.internal:8020/user/hadoop/.sparkStaging/application_1620404502967_0001/snowplow-snowflake-transformer-0.8.0.jar
21/05/07 16:24:35 INFO S3NativeFileSystem: Opening 's3://snowplow-hosted-assets/4-storage/snowflake-loader/snowplow-snowflake-transformer-0.8.0.jar' for reading
21/05/07 16:24:37 INFO Client: Uploading resource file:/mnt/tmp/spark-111fe5b0-ac56-4d42-8202-3b5937863dc0/__spark_conf__5893707570863451768.zip -> hdfs://ip-172-31-70-99.ec2.internal:8020/user/hadoop/.sparkStaging/application_1620404502967_0001/__spark_conf__.zip
21/05/07 16:24:37 INFO SecurityManager: Changing view acls to: hadoop
21/05/07 16:24:37 INFO SecurityManager: Changing modify acls to: hadoop
21/05/07 16:24:37 INFO SecurityManager: Changing view acls groups to: 
21/05/07 16:24:37 INFO SecurityManager: Changing modify acls groups to: 
21/05/07 16:24:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
21/05/07 16:24:37 INFO Client: Submitting application application_1620404502967_0001 to ResourceManager
21/05/07 16:24:37 INFO YarnClientImpl: Submitted application application_1620404502967_0001
21/05/07 16:24:38 INFO Client: Application report for application_1620404502967_0001 (state: ACCEPTED)
21/05/07 16:24:39 INFO Client: 
	 client token: N/A
	 diagnostics: [Fri May 07 16:24:38 +0000 2021] Scheduler has assigned a container for AM, waiting for AM container to be launched
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1620404677697
	 final status: UNDEFINED
	 tracking URL: http://ip-172-31-70-99.ec2.internal:20888/proxy/application_1620404502967_0001/
	 user: hadoop
21/05/07 16:24:40 INFO Client: Application report for application_1620404502967_0001 (state: ACCEPTED)
21/05/07 16:24:41 INFO Client: Application report for application_1620404502967_0001 (state: ACCEPTED)
21/05/07 16:24:42 INFO Client: Application report for application_1620404502967_0001 (state: ACCEPTED)
21/05/07 16:24:43 INFO Client: Application report for application_1620404502967_0001 (state: ACCEPTED)
21/05/07 16:24:44 INFO Client: Application report for application_1620404502967_0001 (state: ACCEPTED)
21/05/07 16:24:45 INFO Client: Application report for application_1620404502967_0001 (state: FAILED)
21/05/07 16:24:45 INFO Client: 
	 client token: N/A
	 diagnostics: Application application_1620404502967_0001 failed 1 times due to AM Container for appattempt_1620404502967_0001_000001 exited with  exitCode: 13
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1620404502967_0001_01_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
	at org.apache.hadoop.util.Shell.run(Shell.java:869)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:236)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:305)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 13
For more detailed output, check the application tracking page: http://ip-172-31-70-99.ec2.internal:8088/cluster/app/application_1620404502967_0001 Then click on links to logs of each attempt.
. Failing the application.
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1620404677697
	 final status: FAILED
	 tracking URL: http://ip-172-31-70-99.ec2.internal:8088/cluster/app/application_1620404502967_0001
	 user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1620404502967_0001 finished with failed status
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1159)
	at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1518)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/05/07 16:24:45 INFO ShutdownHookManager: Shutdown hook called
21/05/07 16:24:45 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-5a8305d2-2650-4704-9182-118adf8cf2f4
21/05/07 16:24:45 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-111fe5b0-ac56-4d42-8202-3b5937863dc0
Command exiting with ret '1'

For some reason there is no hyperlink for stdout. Not sure why that is. Let me know if I can provide any other helpful details!

samurijv2 · May 11, 2021, 1:33pm

One more thing. Just to confirm, the paths to my other files in playbook.json (e.g. “{{base64file “./snowflake/snowflake-loader-config.json”}}” ) appear in my message above exactly as they do in the real file, curly braces and all. Just thought I’d mention that since I’m not 100% sure if that’s actually the right syntax. I know in other templates the curly braces were used to indicate values that needed to be replaced by the user. However, I thought they might have been needed here to keep there from being to consecutive sets of quotation marks. Please let me know if that’s correct.

samurijv2 · May 14, 2021, 7:48pm

Just following up on my earlier posts. Any thoughts on the stderr output?

BenB · May 19, 2021, 7:16am

Hi @samurijv2 ,

Just to confirm, the paths to my other files in playbook.json (e.g. “{{base64file “./snowflake/snowflake-loader-config.json”}}” ) appear in my message above exactly as they do in the real file

Yes that’s the correct syntax.

Any thoughts on the stderr output?

stderr is not very helpful there. If you go to Application user interfaces of your EMR cluster, you should see you Spark job appearing and if you click on it, and then on the tab Executors, you should get the logs of the driver and workers, the error is probably there :

samurijv2 · May 19, 2021, 4:48pm

Interesting. When I click the spark job and navigate to the “Executors” tab, this is what I see:

No Executors appear. Any ideas why this may be?

BenB · May 20, 2021, 7:06am

It might be that the Spark job failed to launch.

Do you see something in Jobs and Stages ?

This line :

Warning: Skip remote jar s3://snowplow-hosted-assets/4-storage/snowflake-loader/snowplow-snowflake-transformer-0.8.0.jar.

seems weird to me. I can see that you’re in us-east-1 region, could you try updating the jar path to s3://snowplow-hosted-assets-us-east-1/4-storage/snowflake-loader/snowplow-snowflake-transformer-0.8.0.jar in your playbook?

samurijv2 · May 20, 2021, 8:02pm

Nothing in jobs and stages. Sure, I tried changing the path like you describe and the step fails in the same way as before.

BenB · May 28, 2021, 8:58am

It seems that exit code 13 is related to the deploy mode.

Could you try replacing

"--deploy-mode",
"cluster",

with

"--deploy-mode",
"client",

and see if that works ?

Aurimas_Griciunas · August 25, 2021, 3:29pm

Hello,

Has this been resolved? Running into the same problems, changing deploy mode to client did solve code 13 error but now the application crashes without any errors whatsoever.

Topic		Replies	Views
Spark missing in Dataflow-runner Enrichment	25	3730	December 10, 2020
Snowflake Transformer failing exitCode: 15 Troubleshooting	4	1390	December 16, 2021
Snowflake Loader - Process ran successfully but no data loaded Storage targets	12	3919	May 29, 2019
Snowflake Loader - Process ran successfully but no data loaded in transform s3 bucket	7	1431	October 17, 2019
Dataflow runner error on batch transform For engineers	2	895	May 26, 2023

Issue using dataflow runner to run snowflake loader

Related topics