RDB shredder doesn't create S3 folder referenced in SQS message

Hi, I setup RDB Shredder and Loader v1.2.3 and things seem to run OK, except when RDB Loader receives a subsequent message via SQS referencing a folder that doesn’t exist. It seems like it works the first time, but S3 folders are not being created after that. Below is the message:

{
   "data" : {
      "base" : "s3://snowplow-enriched-data/good/run=2022-07-04-19-05-00/",
      "compression" : "GZIP",
      "count" : {
         "good" : 0
      },
      "processor" : {
         "artifact" : "snowplow-rdb-loader-common",
         "version" : "1.2.3"
      },
      "timestamps" : {
         "jobCompleted" : "2022-07-04T19:10:00.894930Z",
         "jobStarted" : "2022-07-04T19:05:00Z",
         "max" : null,
         "min" : null
      },
      "types" : []
   },
   "schema" : "iglu:com.snowplowanalytics.snowplow.storage.rdbloader/shredding_complete/jsonschema/1-0-1"
}

The folder that fails to exist in S3 is “run=2022-07-04-19-05-00”, but a folder does exist for my previous run which was named “run=2022-07-04-19-00-00” and was created fine by the shredder.

Is this a bug or perhaps a misconfiguration?

Hi @pt-mike,

I hope you don’t mind, in my reply I am going to call it the “transformer” instead of the “shredder”. We recently rebranded it for reasons described over here and I’m trying to be consistent from now on to avoid confusing people further!

Are you using the streaming transformer (as opposed to the batch transformer, which runs on EMR). I know there were problems with older versions of the streaming transformer, where it would sometimes create an empty batch even if there were no events to process.

I believe though that the problem is fixed in the newest versions of RDB Loader. The newest docker images are:

# The streaming transformer:
docker pull snowplow/transformer-kinesis:4.1.0

# The redshift loader:
docker pull snowplow/rdb-loader-redshift:4.1.0

Unfortunately the config files for RDB Loader have changed quite a bit since version 1.2.3 (we think for the better!) You will need to check the RDB Loader docs site for the latest setup guide. There’s also some helpful information about upgrading in the release announcements for version 2.0.0 and version 3.0.0

1 Like

Hi @istreeter! I don’t mind at all. In fact, I’ve found mentions of Shredder scattered throughout the new documentation so it was confusing for me, too.

Yes, we are using the streaming transformer. I can try the new version of the stream transformer, but I was reluctant to do that because the recommended version according to compatibility matrix was v1.2.x - Latest compatibility matrix - Snowplow Docs

I’ll report back with my results! Thanks!

Mike

1 Like