Hello there!
I am really excited about the latest release with the new bad row format!
Good job on that!
Today I was looking into how to upgrade from r117 to r118.
Before deploying in our AWS environment I always try to run everything locally using the NSQ stack.
Basically I am using the example from the snowplow-docker, that I update on a regular basis.
I am running into an issue with the referer parser enrichment.
Following the upgrade manual, I got the following error when my stream-enrich container starts:
stream-enrich_1 | An error occured: Scheme s3 for file s3://snowplow-hosted-assets/third-party/referer-parser/referer-tests.json not supported
example_stream-enrich_1 exited with code
By looking at the code, my first guess is that it happens because scala.Source
does not support that scheme.
To work around that I tried to mount a local volume to my container and use a file URI such as:
{
"schema": "iglu:com.snowplowanalytics.snowplow/referer_parser/jsonschema/2-0-0",
"data": {
"vendor": "com.snowplowanalytics.snowplow",
"name": "referer_parser",
"enabled": true,
"parameters": {
"database": "referer-parser.json",
"internalDomains": [
"www.subdomain1.snowplowanalytics.com"
],
"uri": "file:///snowplow/bin"
}
}
}
But it is also failing with the same error except on the file
scheme.
After some more testing, I managed to get something running thanks to the tests, by using the following configuration:
{
"schema": "iglu:com.snowplowanalytics.snowplow/referer_parser/jsonschema/2-0-0",
"data": {
"vendor": "com.snowplowanalytics.snowplow",
"name": "referer_parser",
"enabled": true,
"parameters": {
"database": "referer-tests.json",
"internalDomains": [
"www.subdomain1.snowplowanalytics.com"
],
"uri": "https://s3-eu-west-1.amazonaws.com/snowplow-hosted-assets/third-party/referer-parser/"
}
}
}
So apparently HTTP(s) schemes are the only ones considered valid.
I also noticed that this new version of the enrichment configuration points to a referer-tests.json
database which is significantly smaller that the referers-latest.yml
.
Is there any plan to deploy the full referer-latest.yml
as json in this S3 bucket?
Both of these issues are blockers for my upgrade as they seems to be regressions compared to r117.
Thanks in advance for your support and keep up the good work!