SendGrid+Snowplow+AWS S3&Redshift


for the target file, redshift.json. I have it there as you can see the screenshot. Then I use the following command to run the runner,
./snowplow-emr-etl-runner run -c eer5-1.conf -r resolver1.json -t redshift3.json

Without the redshift part, it’s fine. But if I have it, I got the error as the screenshot. Do I need to do something different than the configuration file and resolver json file?


@AllenWeieiei, -t should be pointing to the directory where your Redshift configuration file is.

Yeah, that’s what I am doing. redshift3.json is my redshift configuration file (from my understanding, it’s similar as the resolver json file, am I write?). You can see it’s there, redshift3.json. The resolver1.json is there too, and using -r can correctly find it.

@ihor Or I have to set up something different for the redshift json?


@AllenWeieiei, as per EmfrEtlRunner usage wiki, -t points to the directory of different targets while -r is a single resolver file.

Got u. Thanks!

Hi, @ihor.

I created a new role and set it’s permission as full access to redshift and S3 and input it’s ARN into the redshift.json. Then I ran the runner, I got this error.

ERROR: Data loading error Problems with establishing DB connection
Amazon Error setting default driver property values.
Following steps completed: [Discover]

Does this mean I need to install some driver?


@AllenWeieiei its likely something with your jdbc setting in redshift.json. Can you show it to us? Usually what you want to have is just;

"jdbc": {
  "ssl": false

“schema”: “”,
“data”: {
“name”: “SendGrid-Target”,
“host”: “xxx”,
“database”: “dev”,
“port”: 5439,
“sslMode”: “DISABLE”,
“username”: “xxxxx”,
“password”: “xxxxx”,
“roleArn”: “xxxxx”,
“schema”: “atomic”,
“maxError”: 1,
“compRows”: 20000,
“sshTunnel”: null,
“purpose”: “ENRICHED_EVENTS”

@anton, Thanks, please check it!

@anton In my redshift.json, I have sslmode=‘DISABLE’. Is this equivalent to what you mentioned before? (‘jdbc,’‘ssl’=false)?

Hi, @anton. I got the runner working in my environment! (ETL from SendGrid to Redshift completed successfully)

I have a question about the normal daily running flow about the process. My DevOp teammate helped set up the collector keep running and the kinesis stream will get data after new data coming from SendGrid. How about the following steps?

Enrich (can we set it running constantly? or do we need to do that?)
S3 Loader (sounds like this one maybe not be able to run constantly because it depends on data in the enriched steam?)
EmrEtlRunner (from the documentation, this one should be running based on a schedule?)

Much appreciate all your and @ihor’s help! I have asked many many many questions…