Unexpected Argument Error

Hello all! I’m working on setting up a snowplow loader that reads in data from a kinesis stream and dumps it into databricks. We have iglu configs that are working on other instances, but it keeps failing on my EC2 instance.
I’m running the following docker command and passing in two configs as arguments: (I’ve pasted those below).
sudo docker run snowplow/rdb-loader-databricks:4.2.0 --iglu-config ./encoded.iglu.resolver.json --config $(cat ./databricks.config.hocon)

I keep getting an ERROR Unexpected argument: schema error. When I remove the schema line, that error becomes ERROR Unexpected argument: data. I suspect it’s an issue with the docker image itself but I’m not sure where to go from here.

Iglu config:

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-2",
  "data": {
    "cacheSize": 500,
    "cacheTtl": 600,
    "repositories": [{"connection":{"http":{"uri":"http://iglucentral.com"}},"name":"Iglu Central","priority":10,"vendorPrefixes":[]},{"connection":{"http":{"uri":"http://mirror01.iglucentral.com"}},"name":"Iglu Central - Mirror 01","priority":20,"vendorPrefixes":[]},{"connection":{]}]
  }
}

Snowplow config:

source = "kinesis"
sink {
  good = "databricks"
  bad = "kinesis"
}
enabled = "good"
aws {
  accessKey = iam
  secretKey = iam
}
queue {
  enabled = kinesis
  initialPosition = "TRIM_HORIZON"
  initialTimestamp = ""
  maxRecords = 10000
  region = "us-west-2"
  appName = ""
  disableCloudWatch = true
}
streams {
  inStreamName = "enriched-stream"
  outStreamName = "invalid-stream"
  buffer {
    byteLimit = 1000000
    recordLimit = 500
    timeLimit = 500
  }
}
databricks {
  "messageQueue": "test-queue",

  "storage" : {
    "host": ""
    "password": ""
    "schema": "iglu_resolver",
    "port": 443,
    "httpPath": "",
}

Hi @amal.eldick - Awesome that you’re interested in the Snowplow/Databricks integration. This is still quite a new integration, so I’m excited to see it more widely used.

I was not able to recreate your problem exactly, but I found a few other things wrong that I can help you with.

First of all, your iglu config is not valid JSON. If you look towards the end of line 6, you currently have "connection":{] so it looks like something has gone wrong with your closing brace. This page on our docs site has an example of what a valid Iglu resolver should look like.

Next, the RDB loader requires that you provide base64-encoded configuration on the command line; it is not allowed to pass a file name. (This is something I want to change in future, but that’s another story). So it should look more like this:

--iglu-config=<BASE64 CONFIG HERE> --config=<BASE64 CONFIG HERE>

## or....

--iglu-config=$(base64 -w0 ./iglu.resolver.json) --config $(base64 -w0 ./databricks.config.hocon)

Finally – and this is the biggest thing – the config file you shared is not at all a valid config for RDB Loader. May I ask, where did you copy this example config from?

To go from the kinesis stream into Databricks you actually need to run two separate applications:

  1. First run the RDB transformer. This application reads from Kinesis and writes small files to S3 ready for loading. The documentation for this application is here and there is an example configuration file over here.
  2. Next run the loader. This application loads the files created in step 1. The documentation for this application is here and there is an example configuration file over here

Please let us know how you get on!