Shredding to Redshift in the Scala Collector Flow

Hi team

We have a realtime setup working via

Scala Stream Collector -> Raw Kinesis -> Scala Enricher -> Enriched Kinesis -> Elastic Search Sink
                               ↓
                            S3 Sink

What we are trying to achieve is

Scala Stream Collector -> Raw Kinesis -> Scala Enricher -> Enriched Kinesis -> Elastic Search Sink 
                                 ↓
                              S3 Sink
                                 ↓
                           Batch Pipeline
                                 ↓
                             Redshift

Although, we aren’t able to find the relevant guides for this. We found most of the guides advising using Clojure collector if we had to use EmrEtlRunner. This will require us to redo the realtime setup which is not favourable.

Can someone suggest if at all above is possible and point to relevant articles if so?

Thanks in advance.

If you set the collectors.format setting to thrift in your configuration file for EmrEtlRunner this will then be able to enrich your data.

As @mike says, this is possible (and common). The relevant guide is here:

2 Likes