Hello all! I’m working on setting up a snowplow loader that reads in data from a kinesis stream and dumps it into databricks. We have iglu configs that are working on other instances, but it keeps failing on my EC2 instance.
I’m running the following docker command and passing in two configs as arguments: (I’ve pasted those below).
sudo docker run snowplow/rdb-loader-databricks:4.2.0 --iglu-config ./encoded.iglu.resolver.json --config $(cat ./databricks.config.hocon)
I keep getting an ERROR Unexpected argument: schema
error. When I remove the schema line, that error becomes ERROR Unexpected argument: data
. I suspect it’s an issue with the docker image itself but I’m not sure where to go from here.
Iglu config:
{
"schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-2",
"data": {
"cacheSize": 500,
"cacheTtl": 600,
"repositories": [{"connection":{"http":{"uri":"http://iglucentral.com"}},"name":"Iglu Central","priority":10,"vendorPrefixes":[]},{"connection":{"http":{"uri":"http://mirror01.iglucentral.com"}},"name":"Iglu Central - Mirror 01","priority":20,"vendorPrefixes":[]},{"connection":{]}]
}
}
Snowplow config:
source = "kinesis"
sink {
good = "databricks"
bad = "kinesis"
}
enabled = "good"
aws {
accessKey = iam
secretKey = iam
}
queue {
enabled = kinesis
initialPosition = "TRIM_HORIZON"
initialTimestamp = ""
maxRecords = 10000
region = "us-west-2"
appName = ""
disableCloudWatch = true
}
streams {
inStreamName = "enriched-stream"
outStreamName = "invalid-stream"
buffer {
byteLimit = 1000000
recordLimit = 500
timeLimit = 500
}
}
databricks {
"messageQueue": "test-queue",
"storage" : {
"host": ""
"password": ""
"schema": "iglu_resolver",
"port": 443,
"httpPath": "",
}