SendGrid+Snowplow+AWS S3&Redshift

AllenWeieiei · October 23, 2019, 5:02pm

ok!

for the target file, redshift.json. I have it there as you can see the screenshot. Then I use the following command to run the runner,
./snowplow-emr-etl-runner run -c eer5-1.conf -r resolver1.json -t redshift3.json

Without the redshift part, it’s fine. But if I have it, I got the error as the screenshot. Do I need to do something different than the configuration file and resolver json file?

Thanks!

ihor · October 23, 2019, 5:35pm

@AllenWeieiei, -t should be pointing to the directory where your Redshift configuration file is.

AllenWeieiei · October 23, 2019, 5:42pm

Yeah, that’s what I am doing. redshift3.json is my redshift configuration file (from my understanding, it’s similar as the resolver json file, am I write?). You can see it’s there, redshift3.json. The resolver1.json is there too, and using -r can correctly find it.

AllenWeieiei · October 23, 2019, 5:43pm

@ihor Or I have to set up something different for the redshift json?

Thanks!

ihor · October 23, 2019, 5:57pm

@AllenWeieiei, as per EmfrEtlRunner usage wiki, -t points to the directory of different targets while -r is a single resolver file.

AllenWeieiei · October 23, 2019, 6:18pm

Got u. Thanks!

AllenWeieiei · October 24, 2019, 3:27pm

Hi, @ihor.

I created a new role and set it’s permission as full access to redshift and S3 and input it’s ARN into the redshift.json. Then I ran the runner, I got this error.

ERROR: Data loading error Problems with establishing DB connection
Amazon Error setting default driver property values.
Following steps completed: [Discover]

Does this mean I need to install some driver?

Thanks!

anton · October 24, 2019, 3:46pm

@AllenWeieiei its likely something with your jdbc setting in redshift.json. Can you show it to us? Usually what you want to have is just;

"jdbc": {
  "ssl": false
}

AllenWeieiei · October 24, 2019, 4:00pm

{
“schema”: “iglu:com.snowplowanalytics.snowplow.storage/redshift_config/jsonschema/2-1-0”,
“data”: {
“name”: “SendGrid-Target”,
“host”: “xxx”,
“database”: “dev”,
“port”: 5439,
“sslMode”: “DISABLE”,
“username”: “xxxxx”,
“password”: “xxxxx”,
“roleArn”: “xxxxx”,
“schema”: “atomic”,
“maxError”: 1,
“compRows”: 20000,
“sshTunnel”: null,
“purpose”: “ENRICHED_EVENTS”
}
}

@anton, Thanks, please check it!

AllenWeieiei · October 24, 2019, 7:44pm

@anton In my redshift.json, I have sslmode=‘DISABLE’. Is this equivalent to what you mentioned before? (‘jdbc,’‘ssl’=false)?

AllenWeieiei · October 30, 2019, 8:33pm

Hi, @anton. I got the runner working in my environment! (ETL from SendGrid to Redshift completed successfully)

I have a question about the normal daily running flow about the process. My DevOp teammate helped set up the collector keep running and the kinesis stream will get data after new data coming from SendGrid. How about the following steps?

Enrich (can we set it running constantly? or do we need to do that?)
S3 Loader (sounds like this one maybe not be able to run constantly because it depends on data in the enriched steam?)
EmrEtlRunner (from the documentation, this one should be running based on a schedule?)

Much appreciate all your and @ihor’s help! I have asked many many many questions…

Topic		Replies	Views
Shredding to Redshift in the Scala Collector Flow AWS batch pipeline (Legacy)	2	2117	September 24, 2017
Snowplow Kinesis to EmrEtl For engineers	4	1769	July 31, 2019
'Serverless' Snowplow architecture For engineers	7	3511	June 1, 2017
How to shred events into Redshift from the real-time pipeline? AWS real-time pipeline	2	3010	May 25, 2016
Enriched event stream into Redshift using Kinesis Firehose AWS real-time pipeline	7	5763	May 31, 2016

SendGrid+Snowplow+AWS S3&Redshift

Related topics