How to run RDB shredder?

pramod.niralakeri · December 16, 2021, 1:19am

I’m having a stupid question.

How to run rdb shredder?

I’ve this jar file from s3://snowplow-hosted-assets/4-storage/rdb-shredder/snowplow-rdb-shredder-2.2.0-rc1.jar

and config file.

so running rdb shredder is just executing the jar file right?

pramod.niralakeri · December 29, 2021, 5:16pm

Bump

enes_aldemir · December 31, 2021, 8:54am

Hey @pramod.niralakeri ,

We have two types of shredder currently.

First one is RDB Batch Shredder. RDB Batch Shredder is Spark job. It needs to be run on AWS EMR. You can use Dataflow Runner to submit job to EMR. You can follow this guide to get more information about running the RDB Shredder on EMR.

Second one is RDB Stream Shredder. Stream Shredder is reading from Kinesis stream directly and writing its output to s3. It is plain Java application. Therefore you don’t need any platform like EMR to run it. Reference config file for Stream Shredder can be found here. However, keep in mind that Stream Shredder is still in experimental phase. We don’t recommend to run it on high volume pipelines.

Let us know if you have any further question.

pramod.niralakeri · December 31, 2021, 9:33am

Thank you. @enes_aldemir that clears my doubt, but just curious and wondering to know. When we’ll get production ready stream shredder?

Also, to run it(Stream shredder) in staging environment, what type of output file format/compression I should use? So that RDB Loader takes it for to load into Redshift.

Topic		Replies	Views
RDB Loader 1.1.0 docs refer to Shredding / EMR	2	719	September 5, 2022
How to setup Shredder? Data store sources	3	1075	January 19, 2021
Setup and run RDB Stream Shredder For engineers	17	1567	August 10, 2023
Should I run rdb_load only? For engineers	7	1235	February 11, 2020
Most up-to-date approach to running RDBLoader Storage targets	2	1204	June 12, 2018

How to run RDB shredder?

Related topics