Transformer batch 4.2.1 throws InvalidInputException

Hi,

I was updating the transformer batch from our old rdb-shredder based on [this], however, I keep getting the following failure on the Transformation step on EMR:

(Spark transformer - Snowplow Docs)

.

My playbook is:

{
"schema": "iglu:com.snowplowanalytics.dataflowrunner/ClusterConfig/avro/1-1-0",
"data": {
"region": "eu-west-1",
"credentials": {
"accessKeyId": "AWS_ACCESS_KEY_ID",
"secretAccessKey": "AWS_SECRET_ACCESS_KEY"
},
"steps": [
{
"type": "CUSTOM_JAR",
"name": "S3DistCp enriched data archiving",
"actionOnFailure": "CANCEL_AND_WAIT",
"jar": "/usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar",
"arguments": [
"--src", "SP_LOADER_URI",
"--dest", "SP_ENRICHED_URIrun={{nowWithFormat "2006-01-02-15-04-05"}}/",
"--srcPattern", ".*",
"--outputCodec", "gz",
"--deleteOnSuccess"
]
},
{
"type": "CUSTOM_JAR",
"name": "RDB Transformer Shredder",
"actionOnFailure": "CANCEL_AND_WAIT",
"jar": "command-runner.jar",
"arguments": [
"spark-submit",
"--class", "com.snowplowanalytics.snowplow.rdbloader.transformer.batch.Main",
"--master", "yarn",
"--deploy-mode", "cluster",
"s3://snowplow-hosted-assets/4-storage/transformer-batch/snowplow-transformer-batch-4.1.0.jar",
"--iglu-config", "{{base64File "resolver.json"}}",
"--config", "{{base64File "config.hocon"}}"
]
}
],
"tags": []
}
}

I had to use com.snowplowanalytics.snowplow.rdbloader.transformer.batch.Main instead of the com.snowplowanalytics.snowplow.rdbloader.shredder.batch.Main. Other than that, since I am deploying in eu-west-1 I used the assed without region in it s3://snowplow-hosted-assets/4-storage…

Do you have an idea what could I do?

Hey @atordai,

Getting input path does not exist error looks suspicious. Could you check the input path in batch transformer’s config ? Also, could you try version 3.0.3, please ?