In the process of setting up RDB loader(post R35) and got stuck on the shredding part.
In my error log i can se that they file location is wrong
ERROR Client: Application diagnostics message: User class threw exception: java.io.IOException: Not a file: s3://XXX/enriched/archive/run=2021-03-15-18-29-34/2021
as reads the year as well. However not sure where i change this i have
- Made sure the date format is correct and after a enriched run i got file in the following structure…
“s3:/xxx/enriched/archive/run=2021-03-15-18-29-34/2021/03/15/” - I don’t use any custom dateFormat in s3 sink
config.hocon looks like
{
"name": "{{client}}",
"id": "24cda775-ea2d-4cfd-b4f8-b580670cb465",
"region": "{{aws_region}}",
"messageQueue": "{{fifo_que}}",
"shredder": {
"input": "s3://{{s3_shredded}}/enriched/archive/",
"output": "s3://{{s3_shredded}}/good/",
"outputBad": "s3://{{s3_shredded}}/bad/",
"compression": "GZIP"
},
"formats": {
"default": "TSV",
"json": {{shredded_as_jsons}},
"tsv": {{shredded_as_tsvs}},
"skip": {{skip_schemas}}
},
"storage": {
"type": "redshift",
"host": "{{redshift_hostname}}",
"database": "{{snowplow_database_name}}",
"port": {{db_port}},
"roleArn": "{{roleArn}}",
"schema": "{{schema_name}}",
"username": "{{username}}",
"password": "{{password}}",
"jdbc": {"ssl": true},
"maxError": 10,
"compRows": 100000
},
"steps": {{steps}},
monitoring = {
"snowplow": {
"collector": "{{collectorUri}}"
"appId": "{{appName}}"
method:"get"
},
"sentry": null
}
}
and playbook.json is
{
"schema": "iglu:com.snowplowanalytics.dataflowrunner/PlaybookConfig/avro/1-0-1",
"data": {
"region": "{{aws_region}}",
"credentials": {
"accessKeyId": "default",
"secretAccessKey": "default"
},
"steps": [
{
"type": "CUSTOM_JAR",
"name": "S3DistCp enriched data archiving",
"actionOnFailure": "CANCEL_AND_WAIT",
"jar": "/usr/share/aws/emr/s3-dist-cp/lib/s3-dist-cp.jar",
"arguments": [
"--src", "s3://{{s3_enriched_bucket}}/enriched/good/",
"--dest", "s3://{{s3_shredded}}/enriched/archive/run={{nowWithFormat "2006-01-02-15-04-05"}}/",
"--s3Endpoint", "s3-{{aws_region}}.amazonaws.com",
"--srcPattern", ".*",
"--outputCodec", "gz",
"--deleteOnSuccess"
]
},
{
"type": "CUSTOM_JAR",
"name": "RDB Shredder",
"actionOnFailure": "CANCEL_AND_WAIT",
"jar": "command-runner.jar",
"arguments": [
"spark-submit",
"--class", "com.snowplowanalytics.snowplow.shredder.Main",
"--master", "yarn",
"--deploy-mode", "cluster",
"s3://snowplow-hosted-assets-{{aws_region}}/4-storage/rdb-shredder/snowplow-rdb-shredder-0.19.0.jar",
"--iglu-config", "{{base64File "/config/iglu_resolver.json"}}",
"--config", "{{base64File "/config/config.hocon"}}"
]
}
],
"tags": [ ]
}
}
Input welcome.
Best
f