I’ve set up a Snowplow collector and enrich with Kafka sink using docker images. Now I want to load enriched stream into AWS Redshift directly.
Want to explore real time continuous load, or batch load.
please guide me through
I’ve set up a Snowplow collector and enrich with Kafka sink using docker images. Now I want to load enriched stream into AWS Redshift directly.
Want to explore real time continuous load, or batch load.
please guide me through
There’s no continuous load into Redshift (Redshift does not support streaming) but you can do batch or microbatch loads.
If your data is in an enriched Kafka topic you’ll next want to:
what file format should I use confluent S3 sink?
wondering if you could share any standard config for S3 sink and EmrEtl
Since I wrote the initial post it is now possible to stream data into Redshift (but it’s in private preview).
For the Confluent S3 sink you’ll want to sink data as gzipped TSV for rdb-shredder / loader to pick up from S3.
Is there any alternative to load enriched data in kafka to Redshift?
I’m not able to setup confluent s3 sink(too many issue I’m facing, and community also wasn’t really helpful)
Please help
The Confluent connector should work, but if not Lenses also has a s3 sink connector that has mostly overlapping functionality here.
thanks mike appreciated.
just curious, if we move to kinesis (from kafka) will we be able to load enriched kinesis data directly into redshift (without shredding and Emr in betweel)?
if so how? Thanks
Hey @pramod.niralakeri ,
It is not possible to load enriched data to Redshift without shredding first. I tried to summarize the current Shredder landscape in this answer.
To add to the answer from @enes_aldemir - you can load data from Kinesis to Redshift directly in AWS (including enriched data if you wanted to) but currently this data requires shredding which takes place outside the database. In theory it is possible to shred within the database itself but this is not currently supported by Snowplow.