Snowflake Transformer failing exitCode: 15

Hi, I’m trying to set up the SnowflakeDB Loader, but it’s currently failing in the “Snowflake Transformer” step (“Staging enriched data” is successful).

I followed the Quick start AWS guide and managed to get everything working with the Postgres loader and am now trying to replace it with Snowflake. (Using the S3 loader to get data into my bucket).

Some precursor thoughts:

I’m a bit confused about how the setup for the roleArn.
The setup guide Setup - Snowplow Docs says to follow the snowflake setup for setting up an integration.
I did that (which I have done before for other services and got it working).
But then the guide says to another policy with similar statements but a different trust relationship which doesn’t include the Snowflake role, only EMR_EC2_DefaultRole and to use this for the roleArn.
Obviously this won’t work as it cannot be assumed by Snowflake.
So I’m wondering if this policy should be combined with the snowflake one and if so, the trust relationship needs to include both EMR_EC2_DefaultRole and my snowflake role? (I did this for now, to be able to run the setup).

The setup also mentions: “Don’t forget to use 1-0-3 version of configuration schema.”, I’m not sure where you mean I’m supposed to use this? It doesn’t seem related to the loader? Or if it’s just for the resolver.json I suppose it’s already set up.

I also created a DynamoDB table which I’m unsure if it’s correct. I only gave it this config:

  name           = "snowflake-event-manifest"
  billing_mode   = "PROVISIONED"
  read_capacity  = 20
  write_capacity = 20
  hash_key       = "RunId"

  attribute {
    name = "RunId"
    type = "S"
  }

But it has an item count of 0.

Then, to run everything I did this:

java -jar snowplow-snowflake-loader-0.8.2.jar setup --config ./config.json --resolver ./resolver.json

and then:

./dataflow-runner run-transient --emr-config cluster.json --emr-playbook playbook.json

This is my config.json:

{
    "schema": "iglu:com.snowplowanalytics.snowplow.storage/snowflake_config/jsonschema/1-0-3",
    "data": {
        "name": "Snowflake Config",
        "awsRegion": "eu-west-1",
        "auth": {
            "integrationName": "SNOWPLOW_S3_INTEGRATION",
            "roleArn": "<ROLE_ARN>",
            "sessionDuration": 900
        },
        "manifest": "snowflake-event-manifest",
        "snowflakeRegion": "eu-west-1",
        "database": "SNOWPLOW",
        "input": "s3://MYBUCKET/archive/enriched/",
        "stage": "STAGE_SNOWPLOW_S3_INTEGRATION",
        "badOutputUrl": "s3://MYBUCKET/archive/snowflake/badrow/",
        "stageUrl": "s3://MYBUCKET/archive/snowflake/transformed/",
        "warehouse": "snowplow_warehouse",
        "schema": "atomic",
        "account": "MYSNOWFLAKEACCOUNT",
        "username": "SNOWPLOW_LOADER",
        "password": {
            "ec2ParameterStore": {
                "parameterName": "/analytics/snowflake/snowplow/loader"
            }
        },
        "maxError": 1,
        "purpose": "ENRICHED_EVENTS"
    }
}

And this is my resolver.json:

{
    "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-3",
    "data": {
        "cacheSize": 500,
        "repositories": [
            {
                "name": "Iglu Central",
                "priority": 0,
                "vendorPrefixes": [
                    "com.snowplowanalytics"
                ],
                "connection": {
                    "http": {
                        "uri": "http://iglucentral.com"
                    }
                }
            },
            {
                "name": "MY-COMPANY iglu schema repository",
                "priority": 1,
                "vendorPrefixes": [
                    "se.MY-COMPANY"
                ],
                "connection": {
                    "http": {
                        "uri": "http://MY-COMPANY-iglu-schemas.s3-website-eu-west-1.amazonaws.com"
                    }
                }
            }
        ]
    }
}

And this is my events_manifest.json:

{
    "schema": "iglu:com.snowplowanalytics.snowplow.storage/amazon_dynamodb_config/jsonschema/2-0-0",
    "data": {
        "name": "Snowflake deduplication config",
        "auth": null,
        "awsRegion": "eu-west-1",
        "dynamodbTable": "snowflake-event-manifest",
        "id": "MY ID",
        "purpose": "EVENTS_MANIFEST"
    }
}

And this is my cluster.json:

{
    "schema": "iglu:com.snowplowanalytics.dataflowrunner/ClusterConfig/avro/1-1-0",
    "data": {
        "name": "dataflow-runner - snowflake transformer",
        "logUri": "s3://MYBUCKET/logs/",
        "region": "eu-west-1",
        "credentials": {
            "accessKeyId": "env",
            "secretAccessKey": "env"
        },
        "roles": {
            "jobflow": "EMR_EC2_DefaultRole",
            "service": "EMR_DefaultRole"
        },
        "ec2": {
            "amiVersion": "5.9.0",
            "keyName": "MYKEYNAME",
            "location": {
                "vpc": {
                    "subnetId": "MYSUBNET"
                }
            },
            "instances": {
                "master": {
                    "type": "m2.xlarge"
                },
                "core": {
                    "type": "m2.xlarge",
                    "count": 1
                },
                "task": {
                    "type": "m1.medium",
                    "count": 0,
                    "bid": "0.015"
                }
            }
        },
        "tags": [],
        "bootstrapActionConfigs": [],
        "configurations": [
            {
                "classification": "core-site",
                "properties": {
                    "Io.file.buffer.size": "65536"
                }
            },
            {
                "classification": "mapred-site",
                "properties": {
                    "Mapreduce.user.classpath.first": "true"
                }
            },
            {
                "classification": "yarn-site",
                "properties": {
                    "yarn.resourcemanager.am.max-attempts": "1"
                }
            },
            {
                "classification": "spark",
                "properties": {
                    "maximizeResourceAllocation": "true"
                }
            }
        ],
        "applications": [
            "Hadoop",
            "Spark"
        ]
    }
}

And this is my playbook.json (I removed "--conf", "spark.hadoop.mapreduce.job.outputformat.class=com.snowplowanalytics.snowflake.transformer.S3OutputFormat" from the transformer step as recommended here: Upgraded to snowflake loader 0.8.0 but data is not loaded but it made no difference):

{
  "schema": "iglu:com.snowplowanalytics.dataflowrunner/PlaybookConfig/avro/1-0-1",
  "data": {
    "region": "eu-west-1",
    "credentials": {
      "accessKeyId": "env",
      "secretAccessKey": "env"
    },
    "steps": [
      {
        "type": "CUSTOM_JAR",
        "name": "Staging enriched data",
        "actionOnFailure": "CANCEL_AND_WAIT",
        "jar": "/usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar",
        "arguments": [
          "--src",
          "s3://MYBUCKET/enriched/",
          "--dest",
          "s3://MYBUCKET/archive/snowflake/transformed/run={{nowWithFormat "2006-01-02-15-04-05"}}/",
          "--s3Endpoint",
          "s3.amazonaws.com",
          "--srcPattern",
          ".*\\.gz",
          "--deleteOnSuccess",
          "--s3ServerSideEncryption"
        ]
      },
      {
        "type": "CUSTOM_JAR",
        "name": "Snowflake Transformer",
        "actionOnFailure": "CANCEL_AND_WAIT",
        "jar": "command-runner.jar",
        "arguments": [
          "spark-submit",
          "--deploy-mode",
          "cluster",
          "--class",
          "com.snowplowanalytics.snowflake.transformer.Main",
          "s3://snowplow-hosted-assets/4-storage/snowflake-loader/snowplow-snowflake-transformer-0.8.2.jar",
          "--config",
          "{{base64File "./config.json"}}",
          "--resolver",
          "{{base64File "./resolver.json"}}",
          "--events-manifest",
          "{{base64File "./events_manifest.json"}}"
        ]
      },
      {
        "type": "CUSTOM_JAR",
        "name": "Snowflake Loader",
        "actionOnFailure": "CANCEL_AND_WAIT",
        "jar": "s3://snowplow-hosted-assets/4-storage/snowflake-loader/snowplow-snowflake-loader-0.8.2.jar",
        "arguments": [
          "load",
          "--base64",
          "--config",
          "{{base64File "./config.json"}}",
          "--resolver",
          "{{base64File "./resolver.json"}}"
        ]
      }
    ],
    "tags": []
  }
}

Creating reply with error logs as the post became too long…

1 Like

Error logs in the “Snowflake Transformer” step:

This is the contents of stderr in the “Snowflake Transformer” step (there is no syslog):

Warning: Skip remote jar s3://snowplow-hosted-assets/4-storage/snowflake-loader/snowplow-snowflake-transformer-0.8.2.jar.
21/12/15 15:54:34 INFO RMProxy: Connecting to ResourceManager at ip-XXX-XX-XX-XX.eu-west-1.compute.internal/XXX.XX.XX.XX:8032
21/12/15 15:54:34 INFO Client: Requesting a new application from cluster with 1 NodeManagers
21/12/15 15:54:34 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (14336 MB per container)
21/12/15 15:54:34 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
21/12/15 15:54:34 INFO Client: Setting up container launch context for our AM
21/12/15 15:54:34 INFO Client: Setting up the launch environment for our AM container
21/12/15 15:54:34 INFO Client: Preparing resources for our AM container
21/12/15 15:54:37 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
21/12/15 15:54:42 INFO Client: Uploading resource file:/mnt/tmp/spark-fb161e5d-4aa8-4d9b-9a3b-fb8550997ea1/__spark_libs__366466599803720464.zip -> hdfs://ip-XXX-XX-XX-XX.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1639583424812_0002/__spark_libs__366466599803720464.zip
21/12/15 15:54:47 WARN RoleMappings: Found no mappings configured with 'fs.s3.authorization.roleMapping', credentials resolution may not work as expected
21/12/15 15:54:48 INFO Client: Uploading resource s3://snowplow-hosted-assets/4-storage/snowflake-loader/snowplow-snowflake-transformer-0.8.2.jar -> hdfs://ip-XXX-XX-XX-XX.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1639583424812_0002/snowplow-snowflake-transformer-0.8.2.jar
21/12/15 15:54:48 INFO S3NativeFileSystem: Opening 's3://snowplow-hosted-assets/4-storage/snowflake-loader/snowplow-snowflake-transformer-0.8.2.jar' for reading
21/12/15 15:54:50 INFO Client: Uploading resource file:/mnt/tmp/spark-fb161e5d-4aa8-4d9b-9a3b-fb8550997ea1/__spark_conf__1620086449972022347.zip -> hdfs://ip-XXX-XX-XX-XX.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1639583424812_0002/__spark_conf__.zip
21/12/15 15:54:50 INFO SecurityManager: Changing view acls to: hadoop
21/12/15 15:54:50 INFO SecurityManager: Changing modify acls to: hadoop
21/12/15 15:54:50 INFO SecurityManager: Changing view acls groups to: 
21/12/15 15:54:50 INFO SecurityManager: Changing modify acls groups to: 
21/12/15 15:54:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
21/12/15 15:54:50 INFO Client: Submitting application application_1639583424812_0002 to ResourceManager
21/12/15 15:54:50 INFO YarnClientImpl: Submitted application application_1639583424812_0002
21/12/15 15:54:51 INFO Client: Application report for application_1639583424812_0002 (state: ACCEPTED)
21/12/15 15:54:51 INFO Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1639583690641
	 final status: UNDEFINED
	 tracking URL: http://ip-XXX-XX-XX-XX.eu-west-1.compute.internal:20888/proxy/application_1639583424812_0002/
	 user: hadoop
21/12/15 15:54:52 INFO Client: Application report for application_1639583424812_0002 (state: ACCEPTED)
21/12/15 15:54:53 INFO Client: Application report for application_1639583424812_0002 (state: ACCEPTED)
21/12/15 15:54:54 INFO Client: Application report for application_1639583424812_0002 (state: ACCEPTED)
21/12/15 15:54:55 INFO Client: Application report for application_1639583424812_0002 (state: ACCEPTED)
21/12/15 15:54:56 INFO Client: Application report for application_1639583424812_0002 (state: ACCEPTED)
21/12/15 15:54:57 INFO Client: Application report for application_1639583424812_0002 (state: ACCEPTED)
21/12/15 15:54:58 INFO Client: Application report for application_1639583424812_0002 (state: FAILED)
21/12/15 15:54:58 INFO Client: 
	 client token: N/A
	 diagnostics: Application application_1639583424812_0002 failed 1 times due to AM Container for appattempt_1639583424812_0002_000001 exited with  exitCode: 15
For more detailed output, check application tracking page:http://ip-XXX-XX-XX-XX.eu-west-1.compute.internal:8088/cluster/app/application_1639583424812_0002Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1639583424812_0002_01_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
	at org.apache.hadoop.util.Shell.run(Shell.java:479)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 15
Failing this attempt. Failing the application.
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1639583690641
	 final status: FAILED
	 tracking URL: http://ip-XXX-XX-XX-XX.eu-west-1.compute.internal:8088/cluster/app/application_1639583424812_0002
	 user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1639583424812_0002 finished with failed status
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1104)
	at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150)
	at org.apache.spark.deploy.yarn.Client.main(Client.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/12/15 15:54:58 INFO ShutdownHookManager: Shutdown hook called
21/12/15 15:54:58 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-fb161e5d-4aa8-4d9b-9a3b-fb8550997ea1
Command exiting with ret '1'

Which doesn’t say a lot. the stderr contents for application_1639575072223_0002 appattempt_1639575072223_0002_000001 which failed above is:

21/10/05 13:19:33 WARN SparkConf: The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead.
21/10/05 13:19:33 WARN SparkConf: The configuration key 'spark.yarn.driver.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.driver.memoryOverhead' instead.
21/10/05 13:19:33 WARN SparkConf: The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead.
21/10/05 13:19:33 WARN SparkConf: The configuration key 'spark.yarn.driver.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.driver.memoryOverhead' instead.
21/10/05 13:19:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/10/05 13:19:34 WARN DependencyUtils: Skip remote jar s3://snowplow-hosted-assets-eu-central-1/4-storage/rdb-shredder/snowplow-rdb-shredder-1.1.0.jar.
21/10/05 13:19:34 INFO RMProxy: Connecting to ResourceManager at ip-XX-X-X-XXX.eu-west-1.compute.internal/10.0.1.236:8032
21/10/05 13:19:35 INFO Client: Requesting a new application from cluster with 1 NodeManagers
21/10/05 13:19:35 INFO Configuration: resource-types.xml not found
21/10/05 13:19:35 INFO ResourceUtils: Unable to find 'resource-types.xml'.
21/10/05 13:19:35 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (57344 MB per container)
21/10/05 13:19:35 INFO Client: Will allocate AM container, with 8192 MB memory including 1024 MB overhead
21/10/05 13:19:35 INFO Client: Setting up container launch context for our AM
21/10/05 13:19:35 INFO Client: Setting up the launch environment for our AM container
21/10/05 13:19:35 INFO Client: Preparing resources for our AM container
21/10/05 13:19:35 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
21/10/05 13:19:38 INFO Client: Uploading resource file:/mnt/tmp/spark-fd6d0cb6-4209-44ac-be5e-2523ba827f63/__spark_libs__6575172148348879032.zip -> hdfs://ip-XX-X-X-XXX.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1633421637273_0004/__spark_libs__6575172148348879032.zip
21/10/05 13:19:46 INFO ClientConfigurationFactory: Set initial getObject socket timeout to 2000 ms.
21/10/05 13:19:46 INFO Client: Uploading resource s3://snowplow-hosted-assets-eu-central-1/4-storage/rdb-shredder/snowplow-rdb-shredder-1.1.0.jar -> hdfs://ip-XX-X-X-XXX.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1633421637273_0004/snowplow-rdb-shredder-1.1.0.jar
21/10/05 13:19:47 INFO S3NativeFileSystem: Opening 's3://snowplow-hosted-assets-eu-central-1/4-storage/rdb-shredder/snowplow-rdb-shredder-1.1.0.jar' for reading
21/10/05 13:19:49 INFO Client: Uploading resource file:/mnt/tmp/spark-fd6d0cb6-4209-44ac-be5e-2523ba827f63/__spark_conf__6303644031315215023.zip -> hdfs://ip-XX-X-X-XXX.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1633421637273_0004/__spark_conf__.zip
21/10/05 13:19:50 INFO SecurityManager: Changing view acls to: hadoop
21/10/05 13:19:50 INFO SecurityManager: Changing modify acls to: hadoop
21/10/05 13:19:50 INFO SecurityManager: Changing view acls groups to: 
21/10/05 13:19:50 INFO SecurityManager: Changing modify acls groups to: 
21/10/05 13:19:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
21/10/05 13:19:50 INFO Client: Submitting application application_1633421637273_0004 to ResourceManager
21/10/05 13:19:50 INFO YarnClientImpl: Submitted application application_1633421637273_0004
21/10/05 13:19:51 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:19:51 INFO Client: 
	 client token: N/A
	 diagnostics: AM container is launched, waiting for AM container to Register with RM
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1633439990247
	 final status: UNDEFINED
	 tracking URL: http://ip-XX-X-X-XXX.eu-west-1.compute.internal:20888/proxy/application_1633421637273_0004/
	 user: hadoop
21/10/05 13:19:52 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:19:53 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:19:54 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:19:55 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:19:56 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:19:57 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:19:58 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:19:59 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:20:00 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:20:01 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:01 INFO Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: ip-XX-X-X-XX.eu-west-1.compute.internal
	 ApplicationMaster RPC port: 45947
	 queue: default
	 start time: 1633439990247
	 final status: UNDEFINED
	 tracking URL: http://ip-XX-X-X-XXX.eu-west-1.compute.internal:20888/proxy/application_1633421637273_0004/
	 user: hadoop
21/10/05 13:20:02 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:03 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:04 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:05 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:06 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:07 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:08 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:09 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:10 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:11 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:12 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:13 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:14 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:15 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:16 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:17 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:18 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:19 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:20 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:21 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:22 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:23 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:24 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:25 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:20:25 INFO Client: 
	 client token: N/A
	 diagnostics: AM container is launched, waiting for AM container to Register with RM
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: default
	 start time: 1633439990247
	 final status: UNDEFINED
	 tracking URL: http://ip-XX-X-X-XXX.eu-west-1.compute.internal:20888/proxy/application_1633421637273_0004/
	 user: hadoop
21/10/05 13:20:26 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:20:27 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:20:28 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:20:29 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:20:30 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:20:31 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:20:32 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:20:33 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:20:34 INFO Client: Application report for application_1633421637273_0004 (state: ACCEPTED)
21/10/05 13:20:35 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:35 INFO Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: ip-XX-X-X-XX.eu-west-1.compute.internal
	 ApplicationMaster RPC port: 37925
	 queue: default
	 start time: 1633439990247
	 final status: UNDEFINED
	 tracking URL: http://ip-XX-X-X-XXX.eu-west-1.compute.internal:20888/proxy/application_1633421637273_0004/
	 user: hadoop
21/10/05 13:20:36 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:37 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:38 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:39 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:40 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:41 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:42 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:43 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:44 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:45 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:46 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:47 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:48 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:49 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:50 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:51 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:52 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:53 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:54 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:55 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:56 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:57 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:58 INFO Client: Application report for application_1633421637273_0004 (state: RUNNING)
21/10/05 13:20:59 INFO Client: Application report for application_1633421637273_0004 (state: FINISHED)
21/10/05 13:20:59 INFO Client: 
	 client token: N/A
	 diagnostics: User class threw exception: java.io.IOException: Not a file: s3://MYBUCKET/enriched/archive/run=2021-10-05-15-18-18/archive
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:303)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
	at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
	at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:292)
	at org.apache.spark.scheduler.DAGScheduler.createShuffleMapStage(DAGScheduler.scala:450)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$getOrCreateShuffleMapStage$1(DAGScheduler.scala:415)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at scala.collection.generic.TraversableForwarder.foreach(TraversableForwarder.scala:38)
	at scala.collection.generic.TraversableForwarder.foreach$(TraversableForwarder.scala:38)
	at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.getOrCreateShuffleMapStage(DAGScheduler.scala:408)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$getOrCreateParentStages$1(DAGScheduler.scala:531)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
	at scala.collection.TraversableLike.map(TraversableLike.scala:238)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
	at scala.collection.mutable.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:48)
	at scala.collection.SetLike.map(SetLike.scala:104)
	at scala.collection.SetLike.map$(SetLike.scala:104)
	at scala.collection.mutable.AbstractSet.map(Set.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.getOrCreateParentStages(DAGScheduler.scala:530)
	at org.apache.spark.scheduler.DAGScheduler.createResultStage(DAGScheduler.scala:517)
	at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1049)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2352)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2344)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2333)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:815)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2120)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2139)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2164)
	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1004)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:1003)
	at com.snowplowanalytics.snowplow.rdbloader.shredder.batch.ShredJob.run(ShredJob.scala:137)
	at com.snowplowanalytics.snowplow.rdbloader.shredder.batch.ShredJob$.$anonfun$run$32(ShredJob.scala:215)
	at com.snowplowanalytics.snowplow.rdbloader.shredder.batch.ShredJob$.$anonfun$run$32$adapted(ShredJob.scala:212)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at com.snowplowanalytics.snowplow.rdbloader.shredder.batch.ShredJob$.run(ShredJob.scala:212)
	at com.snowplowanalytics.snowplow.rdbloader.shredder.batch.Main$.main(Main.scala:41)
	at com.snowplowanalytics.snowplow.rdbloader.shredder.batch.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:728)

	 ApplicationMaster host: ip-XX-X-X-XX.eu-west-1.compute.internal
	 ApplicationMaster RPC port: 37925
	 queue: default
	 start time: 1633439990247
	 final status: FAILED
	 tracking URL: http://ip-XX-X-X-XXX.eu-west-1.compute.internal:20888/proxy/application_1633421637273_0004/
	 user: hadoop
21/10/05 13:20:59 INFO Client: Deleted staging directory hdfs://ip-XX-X-X-XXX.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1633421637273_0004
21/10/05 13:20:59 ERROR Client: Application diagnostics message: User class threw exception: java.io.IOException: Not a file: s3://MYBUCKET/enriched/archive/run=2021-10-05-15-18-18/archive
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:303)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
	at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:276)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:272)
	at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:292)
	at org.apache.spark.scheduler.DAGScheduler.createShuffleMapStage(DAGScheduler.scala:450)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$getOrCreateShuffleMapStage$1(DAGScheduler.scala:415)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at scala.collection.generic.TraversableForwarder.foreach(TraversableForwarder.scala:38)
	at scala.collection.generic.TraversableForwarder.foreach$(TraversableForwarder.scala:38)
	at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.getOrCreateShuffleMapStage(DAGScheduler.scala:408)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$getOrCreateParentStages$1(DAGScheduler.scala:531)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
	at scala.collection.TraversableLike.map(TraversableLike.scala:238)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
	at scala.collection.mutable.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:48)
	at scala.collection.SetLike.map(SetLike.scala:104)
	at scala.collection.SetLike.map$(SetLike.scala:104)
	at scala.collection.mutable.AbstractSet.map(Set.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.getOrCreateParentStages(DAGScheduler.scala:530)
	at org.apache.spark.scheduler.DAGScheduler.createResultStage(DAGScheduler.scala:517)
	at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1049)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2352)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2344)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2333)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:815)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2120)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2139)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2164)
	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1004)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:1003)
	at com.snowplowanalytics.snowplow.rdbloader.shredder.batch.ShredJob.run(ShredJob.scala:137)
	at com.snowplowanalytics.snowplow.rdbloader.shredder.batch.ShredJob$.$anonfun$run$32(ShredJob.scala:215)
	at com.snowplowanalytics.snowplow.rdbloader.shredder.batch.ShredJob$.$anonfun$run$32$adapted(ShredJob.scala:212)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at com.snowplowanalytics.snowplow.rdbloader.shredder.batch.ShredJob$.run(ShredJob.scala:212)
	at com.snowplowanalytics.snowplow.rdbloader.shredder.batch.Main$.main(Main.scala:41)
	at com.snowplowanalytics.snowplow.rdbloader.shredder.batch.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:728)

Exception in thread "main" org.apache.spark.SparkException: Application application_1633421637273_0004 finished with failed status
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1196)
	at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1587)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:936)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1015)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1024)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/10/05 13:20:59 INFO ShutdownHookManager: Shutdown hook called
21/10/05 13:20:59 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-fd6d0cb6-4209-44ac-be5e-2523ba827f63
21/10/05 13:20:59 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-e2ddd0a0-0eef-42a5-8ec4-021970df5632
Command exiting with ret '1'

I don’t see why it’s complaining about Not a file: s3://MYBUCKET/enriched/archive/ as I’m specifically saying in the playbook for the staging step to use "s3://MYBUCKET/archive/snowflake/transformed/run={{nowWithFormat "2006-01-02-15-04-05"}}/", as destination and the config has "s3://MYBUCKET/archive/snowflake/transformed/" as stageUrl?

When adding "--conf", "spark.hadoop.mapreduce.job.outputformat.class=com.snowplowanalytics.snowflake.transformer.S3OutputFormat"

In the transformer step this is instead the error I get:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/filecache/10/__spark_libs__7387419457958288554.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
21/12/16 07:43:19 INFO SignalUtils: Registered signal handler for TERM
21/12/16 07:43:19 INFO SignalUtils: Registered signal handler for HUP
21/12/16 07:43:19 INFO SignalUtils: Registered signal handler for INT
21/12/16 07:43:20 INFO ApplicationMaster: Preparing Local resources
21/12/16 07:43:21 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1639640321877_0002_000001
21/12/16 07:43:21 INFO SecurityManager: Changing view acls to: yarn,hadoop
21/12/16 07:43:21 INFO SecurityManager: Changing modify acls to: yarn,hadoop
21/12/16 07:43:21 INFO SecurityManager: Changing view acls groups to: 
21/12/16 07:43:21 INFO SecurityManager: Changing modify acls groups to: 
21/12/16 07:43:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, hadoop); groups with view permissions: Set(); users  with modify permissions: Set(yarn, hadoop); groups with modify permissions: Set()
21/12/16 07:43:21 INFO ApplicationMaster: Starting the user application in a separate Thread
21/12/16 07:43:21 INFO ApplicationMaster: Waiting for spark context initialization...
21/12/16 07:43:21 ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V
java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V
	at com.snowplowanalytics.iglu.core.SchemaVer$Full.<init>(SchemaVer.scala:41)
	at com.snowplowanalytics.snowflake.transformer.Main$.<init>(Main.scala:36)
	at com.snowplowanalytics.snowflake.transformer.Main$.<clinit>(Main.scala)
	at com.snowplowanalytics.snowflake.transformer.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)
21/12/16 07:43:21 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V)
21/12/16 07:43:22 ERROR ApplicationMaster: Uncaught exception: 
org.apache.spark.SparkException: Exception thrown in awaitResult: 
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:401)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:254)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:764)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:762)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: java.util.concurrent.ExecutionException: Boxed Error
	at scala.concurrent.impl.Promise$.resolver(Promise.scala:55)
	at scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:47)
	at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:244)
	at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
	at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:653)
Caused by: java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V
	at com.snowplowanalytics.iglu.core.SchemaVer$Full.<init>(SchemaVer.scala:41)
	at com.snowplowanalytics.snowflake.transformer.Main$.<init>(Main.scala:36)
	at com.snowplowanalytics.snowflake.transformer.Main$.<clinit>(Main.scala)
	at com.snowplowanalytics.snowflake.transformer.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)
21/12/16 07:43:22 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V)
21/12/16 07:43:22 INFO ApplicationMaster: Deleting staging directory hdfs://ip-XXX-XX-XX-XXX.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1639640321877_0002
21/12/16 07:43:22 INFO ShutdownHookManager: Shutdown hook called

Hi @kramstrom without digging too far into it I believe this is likely to do with the EMR AMI version you are using here. 5.9.0 is very old now and I think for this version of the Snowflake Loader we are using 6.4.0 so it might be worth giving that a try next to see if that fixes the issues you are seeing.

Ill see if I can get a definitive answer for you on what EMR version should be being used though!

Right, so changing to 6.4.0 produced another error also in the transformer step. Looks like there are some scala packages missing?

Note, to get this AMI working I also had to change the s3 endpoint in the staging step to include the aws region, like this: "s3-eu-west-1.amazonaws.com",

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/filecache/10/__spark_libs__3729075268880071489.zip/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/filecache/10/__spark_libs__3729075268880071489.zip/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/filecache/10/__spark_libs__3729075268880071489.zip/RedshiftJDBC.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
21/12/16 10:43:07 INFO SignalUtils: Registering signal handler for TERM
21/12/16 10:43:07 INFO SignalUtils: Registering signal handler for HUP
21/12/16 10:43:07 INFO SignalUtils: Registering signal handler for INT
21/12/16 10:43:08 INFO SecurityManager: Changing view acls to: yarn,hadoop
21/12/16 10:43:08 INFO SecurityManager: Changing modify acls to: yarn,hadoop
21/12/16 10:43:08 INFO SecurityManager: Changing view acls groups to: 
21/12/16 10:43:08 INFO SecurityManager: Changing modify acls groups to: 
21/12/16 10:43:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, hadoop); groups with view permissions: Set(); users  with modify permissions: Set(yarn, hadoop); groups with modify permissions: Set()
21/12/16 10:43:08 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1639650606080_0002_000001
21/12/16 10:43:08 INFO ApplicationMaster: Starting the user application in a separate Thread
21/12/16 10:43:08 INFO ApplicationMaster: Waiting for spark context initialization...
21/12/16 10:43:09 ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: cats.kernel.Semigroup$.catsKernelMonoidForList()Lcats/kernel/Monoid;
java.lang.NoSuchMethodError: cats.kernel.Semigroup$.catsKernelMonoidForList()Lcats/kernel/Monoid;
	at com.monovore.decline.Help$.optionList(Help.scala:74)
	at com.monovore.decline.Help$.detail(Help.scala:105)
	at com.monovore.decline.Help$.fromCommand(Help.scala:50)
	at com.monovore.decline.Parser.<init>(Parser.scala:21)
	at com.monovore.decline.Command.parse(opts.scala:18)
	at com.snowplowanalytics.snowflake.core.Cli$Transformer$.parse(Cli.scala:101)
	at com.snowplowanalytics.snowflake.transformer.Main$.run(Main.scala:39)
	at cats.effect.IOApp.$anonfun$main$3(IOApp.scala:69)
	at cats.effect.internals.IOAppPlatform$.mainFiber(IOAppPlatform.scala:45)
	at cats.effect.internals.IOAppPlatform$.main(IOAppPlatform.scala:27)
	at cats.effect.IOApp.main(IOApp.scala:69)
	at cats.effect.IOApp.main$(IOApp.scala:68)
	at com.snowplowanalytics.snowflake.transformer.Main$.main(Main.scala:34)
	at com.snowplowanalytics.snowflake.transformer.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:735)
21/12/16 10:43:09 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: User class threw exception: java.lang.NoSuchMethodError: cats.kernel.Semigroup$.catsKernelMonoidForList()Lcats/kernel/Monoid;
	at com.monovore.decline.Help$.optionList(Help.scala:74)
	at com.monovore.decline.Help$.detail(Help.scala:105)
	at com.monovore.decline.Help$.fromCommand(Help.scala:50)
	at com.monovore.decline.Parser.<init>(Parser.scala:21)
	at com.monovore.decline.Command.parse(opts.scala:18)
	at com.snowplowanalytics.snowflake.core.Cli$Transformer$.parse(Cli.scala:101)
	at com.snowplowanalytics.snowflake.transformer.Main$.run(Main.scala:39)
	at cats.effect.IOApp.$anonfun$main$3(IOApp.scala:69)
	at cats.effect.internals.IOAppPlatform$.mainFiber(IOAppPlatform.scala:45)
	at cats.effect.internals.IOAppPlatform$.main(IOAppPlatform.scala:27)
	at cats.effect.IOApp.main(IOApp.scala:69)
	at cats.effect.IOApp.main$(IOApp.scala:68)
	at com.snowplowanalytics.snowflake.transformer.Main$.main(Main.scala:34)
	at com.snowplowanalytics.snowflake.transformer.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:735)
)
21/12/16 10:43:09 ERROR ApplicationMaster: Uncaught exception: 
org.apache.spark.SparkException: Exception thrown in awaitResult: 
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:507)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:271)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:902)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:901)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:901)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: java.util.concurrent.ExecutionException: Boxed Error
	at scala.concurrent.impl.Promise$.resolver(Promise.scala:87)
	at scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:79)
	at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
	at scala.concurrent.Promise.tryFailure(Promise.scala:112)
	at scala.concurrent.Promise.tryFailure$(Promise.scala:112)
	at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:187)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:754)
Caused by: java.lang.NoSuchMethodError: cats.kernel.Semigroup$.catsKernelMonoidForList()Lcats/kernel/Monoid;
	at com.monovore.decline.Help$.optionList(Help.scala:74)
	at com.monovore.decline.Help$.detail(Help.scala:105)
	at com.monovore.decline.Help$.fromCommand(Help.scala:50)
	at com.monovore.decline.Parser.<init>(Parser.scala:21)
	at com.monovore.decline.Command.parse(opts.scala:18)
	at com.snowplowanalytics.snowflake.core.Cli$Transformer$.parse(Cli.scala:101)
	at com.snowplowanalytics.snowflake.transformer.Main$.run(Main.scala:39)
	at cats.effect.IOApp.$anonfun$main$3(IOApp.scala:69)
	at cats.effect.internals.IOAppPlatform$.mainFiber(IOAppPlatform.scala:45)
	at cats.effect.internals.IOAppPlatform$.main(IOAppPlatform.scala:27)
	at cats.effect.IOApp.main(IOApp.scala:69)
	at cats.effect.IOApp.main$(IOApp.scala:68)
	at com.snowplowanalytics.snowflake.transformer.Main$.main(Main.scala:34)
	at com.snowplowanalytics.snowflake.transformer.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:735)
21/12/16 10:43:09 INFO ApplicationMaster: Deleting staging directory hdfs://ip-XXX-XX-XX-XXX.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1639650606080_0002
21/12/16 10:43:10 INFO ShutdownHookManager: Shutdown hook called