Hello Everybody,
How are you? Hope you are all OK.
I need your help!!!
Let me explain the context of the problem.
we use AWS stack with Snowplow. We have 4 kinesis data streams:
used by Collector:
raw_data
bad_raw_data
Used by Enricher:
enriched_data
bad_enriched_data
This is our Java Version, collector and enricher:
OpenJDK 64-Bit Server VM (build 17-ea+11-Ubuntu-114.042, mixed mode, sharing)
snowplow-stream-enrich-kinesis-3.2.2.jar
snowplow-stream-collector-kinesis-2.7.0.jar
Collector is working perfectly, it’s in version: snowplow-stream-collector-kinesis-2.7.0.jar
the execution line is:
/usr/bin/java -Dorg.slf4j.simpleLogger.defaultLogLevel=debug -Xms512m -Xmx1024m -jar /srv/snowplow/bin/snowplow-stream-enrich-kinesis-3.2.2.jar --config /srv/snowplow/conf/enrich_new.conf --resolver file:/srv/snowplow/conf/iglu_test_rcc.json --enrichments file:/srv/snowplow/data/enrichments
Regarding Our enricher, this is the version: snowplow-enrich-kinesis-3.2.2.jar
This is the execution line:
/usr/bin/java -Dorg.slf4j.simpleLogger.defaultLogLevel=debug -Xms512m -Xmx1024m -jar /srv/snowplow/bin/snowplow-enrich-kinesis-3.2.2.jar --config /srv/snowplow/conf/enrich_new.conf --iglu-config /srv/snowplow/conf/iglu_test_rcc.json --enrichments /srv/snowplow/data/enrichments
so, this is our config files:
- Enricher conf:
enrich {
streams {
in {
raw = "rawdata-integration"
}
out {
enriched = "enricheddata-integration"
bad = "rawdatabad-integration"
pii = ""
partitionKey = "event_id"
}
sourceSink {
enabled = "kinesis"
region = "eu-west-1"
aws {
accessKey = "iam"
secretKey = "iam"
}
maxRecords = 10000
initialPosition = "TRIM_HORIZON"
backoffPolicy {
minBackoff = 1000
maxBackoff = 60000
}
}
buffer {
byteLimit = 4500000
recordLimit = 500
timeLimit = 60000
}
appName = "snowplow_enrich_progress-integration"
}
}
- the iglu-json Resolver:
{
"schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1",
"data": {
"cacheSize": 1,
"repositories": [
{
"name": "busuu Iglu Repo",
"priority": 5,
"vendorPrefixes": [ "com.customstuff" ],
"connection": {
"http": {
"uri": "file:///srv/snowplow/data"
}
}
},
{
"name": "Iglu Central",
"priority": 0,
"vendorPrefixes": [ "com.snowplowanalytics" ],
"connection": {
"http": {
"uri": "file:///srv/snowplow/data"
}
}
}
]
}
}
As you can see, we have our schemas in our local file system, in /srv/snowplow/data we have a folder called: /schemas that contain everything.
Regarding the enrichment, we only have the ip-lookup.json
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"geo": {
"database": "GeoIP2-City.mmdb",
"uri": "s3://mybucket/integration"
}
}
}
}
The error is this one:
usr/bin/java -Dorg.slf4j.simpleLogger.defaultLogLevel=debug -Xms512m -Xmx1024m -jar /srv/snowplow/bin/snowplow-stream-enrich-kinesis-3.2.2.jar --config /srv/snowplow/conf/enrich_new.hocon --resolver file:/srv/snowplow/conf/iglu_test_rcc.json --enrichments file:/srv/snowplow/data/iglu-central-master/
[main] DEBUG scalacache.guava.GuavaCache - Cache miss for key SchemaKey(com.snowplowanalytics.snowplow,enrichments,jsonschema,Full(1,0,0))
[main] DEBUG scalacache.guava.GuavaCache - Cache miss for key SchemaKey(com.snowplowanalytics.snowplow,enrichments,jsonschema,Full(1,0,0))
[main] DEBUG scalacache.guava.GuavaCache - Inserted value into cache with key SchemaKey(com.snowplowanalytics.snowplow,enrichments,jsonschema,Full(1,0,0))
{"error":"ResolutionError","lookupHistory":[{"repository":"Iglu Central","errors":[{"error":"RepoFailure","message":"sun.net.www.protocol.file.FileURLConnection:file:/srv/snowplow/data/schemas/com.snowplowanalytics.snowplow/enrichments/jsonschema/1-0-0 (of class sun.net.www.protocol.file.FileURLConnection)"}],"attempts":1,"lastAttempt":"2022-07-28T11:55:14.292Z"},{"repository":"Iglu Client Embedded","errors":[{"error":"NotFound"}],"attempts":1,"lastAttempt":"2022-07-28T11:55:14.307Z"},{"repository":"busuu Iglu Repo","errors":[{"error":"RepoFailure","message":"sun.net.www.protocol.file.FileURLConnection:file:/srv/snowplow/data/schemas/com.snowplowanalytics.snowplow/enrichments/jsonschema/1-0-0 (of class sun.net.www.protocol.file.FileURLConnection)"}],"attempts":1,"lastAttempt":"2022-07-28T11:55:14.315Z"}]}
I dont know which is the next step. We want to have all our Schemas inside our machine in local, it seems that it doenst like the 1-0-0 Resolver btw.
Thank you all a lot !!!
Best,
Raúl