Hi,
we managed to setup the shredder and sqs queue so that the rdb loader will be notfied after a shredding job has been completed. However, the rdb-loader cannot find the jsonpath of our events as specified in our resolver.json. We use the same resolver.json as for our enrichment module.
Error message:
2021-03-02 14:20:22ERROR 2021-03-02 13:20:22.534: Data discovery error with following issues:
2021-03-02 14:20:22JSONPath file [com.myapp/my_tracking_event_1.json] was not found
2021-03-02 14:20:22JSONPath file [com.myapp/other_tracking_event_1.json] was not found
2021-03-02 14:20:21INFO 2021-03-02 13:20:21.626: Received new message. Total 1 messages received, 0 loaded, 0 attempts has been made to load current folder
2021-03-02 13:53:43INFO 2021-03-02 12:53:43.835: RDB Loader [myapp] has started. Listening sp-sqs-queue.fifo
We host the jsonpath files in an s3 bucket in exactly that folder s3://our-schema-repo/jsonpaths/com.myapp/my_tracking_event_1.json/.
our resolver.json
{
"schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-0",
"data": {
"cacheSize": 500,
"repositories": [
{
"name": "S3-schemas-registry",
"priority": 0,
"vendorPrefixes": ["com.myapp"],
"connection": {
"http": {
"uri": "SP_SCHEMA_URI"
}
}
},
{
"name": "Iglu Central",
"priority": 1,
"vendorPrefixes": [ "com.snowplowanalytics" ],
"connection": {
"http": {
"uri": "http://iglucentral.com"
}
}
},
{
"name": "Iglu Central - Mirror 01",
"priority": 2,
"vendorPrefixes": ["com.snowplowanalytics"],
"connection": {
"http": {
"uri": "http://mirror01.iglucentral.com"
}
}
}
]
}
}
SP_SCHEMA_URI is replaced by a cloudfront uri which is associated with our s3 bucket that hosts our schemas.
The rdb loader is running as a aws fargate task which should have permissions to access the s3 bucket.
We did not find any field in the config.hocon where the jsonpath should be set. Is that required?
{
"name": "myapp",
"id": "d5a4aab5-7b66-11eb-8ba2-acde48001122",
"region": "eu-west-1",
"messageQueue": "SQS_QUEUE",
"shredder": {
"input": "SP_ENRICHED_URI",
"output": "SP_SHREDDED_GOOD_URI",
"outputBad": "SP_SHREDDED_BAD_URI",
"compression": "GZIP"
},
"formats": {
"default": "JSON",
"json": [ ],
"tsv": [ ],
"skip": [ ]
},
"storage" = {
"type": "redshift",
"host": "redshift.amazon.com",
"database": "DATABASE",
"port": 5439,
"roleArn": "arn:aws:iam::AWS_ACCOUNT_NUMBER:role/RedshiftLoadRole",
"schema": "atomic",
"username": "DB_USER",
"password": "DB_REDSHIFT_PASSWORD",
"jdbc": {"ssl": true},
"maxError": 10,
"compRows": 100000
},
"steps": ["analyze"],
"monitoring": {
"snowplow": null,
"sentry": null
}
}