How to run RDB shredder?

I’m having a stupid question.

How to run rdb shredder?

I’ve this jar file from s3://snowplow-hosted-assets/4-storage/rdb-shredder/snowplow-rdb-shredder-2.2.0-rc1.jar

and config file.

so running rdb shredder is just executing the jar file right?


Hey @pramod.niralakeri ,

We have two types of shredder currently.

First one is RDB Batch Shredder. RDB Batch Shredder is Spark job. It needs to be run on AWS EMR. You can use Dataflow Runner to submit job to EMR. You can follow this guide to get more information about running the RDB Shredder on EMR.

Second one is RDB Stream Shredder. Stream Shredder is reading from Kinesis stream directly and writing its output to s3. It is plain Java application. Therefore you don’t need any platform like EMR to run it. Reference config file for Stream Shredder can be found here. However, keep in mind that Stream Shredder is still in experimental phase. We don’t recommend to run it on high volume pipelines.

Let us know if you have any further question.

Thank you. @enes_aldemir that clears my doubt, but just curious and wondering to know. When we’ll get production ready stream shredder?

Also, to run it(Stream shredder) in staging environment, what type of output file format/compression I should use? So that RDB Loader takes it for to load into Redshift.