How to run RDB shredder?

Hey @pramod.niralakeri ,

We have two types of shredder currently.

First one is RDB Batch Shredder. RDB Batch Shredder is Spark job. It needs to be run on AWS EMR. You can use Dataflow Runner to submit job to EMR. You can follow this guide to get more information about running the RDB Shredder on EMR.

Second one is RDB Stream Shredder. Stream Shredder is reading from Kinesis stream directly and writing its output to s3. It is plain Java application. Therefore you don’t need any platform like EMR to run it. Reference config file for Stream Shredder can be found here. However, keep in mind that Stream Shredder is still in experimental phase. We don’t recommend to run it on high volume pipelines.

Let us know if you have any further question.