Snowflake loader stopped processing enriched files

Hi all,
We’ve been running snowplow for a couple of months now, however, it seems that 2 weeks ago the snowflake loader suddenly stopped shipping data into snowflake.
Here’s a snippet of the log from the point where it stopped pushing:

INFO Loader: Total 13238 messages received, 9087 loaded; Loader is in Idle state; Last state update at 2023-07-28 11:55:05.069
INFO DataDiscovery: Received a new message
INFO DataDiscovery: Total 13239 messages received, 9087 loaded
INFO DataDiscovery: New data discovery at run=2023-07-28-11-55-00 with following shredded types:
  * iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-*-* WIDEROW
  * iglu:com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-*-* WIDEROW
  * iglu:nl.basjes/yauaa_context/jsonschema/1-*-* WIDEROW
False for Info(s3://mybucket/enriched/run=2023-07-28-11-55-00/,com.snowplowanalytics.snowplow,web_page,1,Context)
False for Info(s3://mybucket/enriched/run=2023-07-28-11-55-00/,com.snowplowanalytics.snowplow,ua_parser_context,1,Context)
False for Info(s3://mybucket/enriched/run=2023-07-28-11-55-00/,nl.basjes,yauaa_context,1,Context)
INFO Load: Loading transaction for s3://mybucket/enriched/run=2023-07-28-11-55-00/ has started
INFO Load: Loading s3://mybucket/enriched/run=2023-07-28-11-55-00/
INFO Load: COPY events FROM s3://mybucket/enriched/run=2023-07-28-11-55-00/
INFO Load: Folder [s3://mybucket/enriched/run=2023-07-28-11-55-00/] has been loaded (not committed yet)
INFO Load: Folder s3://mybucket/enriched/run=2023-07-28-11-55-00/ loaded successfully
INFO Load: 0 good events were loaded.  It took minimum 16 seconds and maximum  302 seconds between the collector and warehouse for these events.  It took 306 seconds between the start of transformer and warehouse  and 5 seconds between the completion of transformer and warehouse
INFO Loader: Total 13239 messages received, 9088 loaded; Loader is in Idle state; Last state update at 2023-07-28 12:00:06.385
INFO Loader: Total 13239 messages received, 9088 loaded; Loader is in Idle state; Last state update at 2023-07-28 12:00:06.385
INFO Loader: Total 13239 messages received, 9088 loaded; Loader is in Idle state; Last state update at 2023-07-28 12:00:06.385
INFO Loader: Total 13239 messages received, 9088 loaded; Loader is in Idle state; Last state update at 2023-07-28 12:00:06.385
INFO Loader: Total 13239 messages received, 9088 loaded; Loader is in Idle state; Last state update at 2023-07-28 12:00:06.385
.........

Everything’s built using the terraform quickstart template in AWS.
With regards to the logs, the False lines have been there forever, so don’t seem to have anything with this issue, although looks like it’s still something we should look into at some point.
Otherwise, ever since the last successful load, we’re only seeing those idle state messages.
Also, have checked and there enriched packages are still being dropped into the s3 bucket.

A restart of the container doesn’t seem to have had any sort of effect, still saying loader is in idle state, with the exception that it has reset the message count to 0 now.

Any advice as to how we can further troubleshoot the problem and fix it would be appreciated.

problem fixed

I realised nothing was getting into the SQS queue, and then looking at the transformer logs, noticed it ran into OOM issues

Hi @Florin_Finaru which version of the loader/transformer modules are you using? I quite recently updated these modules to leverage much larger amounts of RAM on the instances to stabilize this - the release version where this was added is here: Release 22.01 Western Ghats (Patch.6) · snowplow/quickstart-examples · GitHub

Hey @josh , thanks for getting back to me!
We’re using Patch 5 at the moment. but to be honest, that was probably not the issue at all. Our prod deployment slipped through with default instance type (t3.micro) :scream: . I’m hoping t3.large instances will be enough for the moment.
We’ll look into upgrading the stack to once we’re done with some of the backlog.

Upon restarting the transformer, we only got the past 24 hours worth of data loaded into snowflake. We’ve got the enriched files stored in s3 between 28th July and 8th August. Would you be able to advise whether it is possible / how to process them as well and get all missing data loaded into snowflake?

So a few things you should look at:

  1. Default Kinesis Stream retention is just 24 hours - you can extend it to 168 hours or really now up to 365 days (this is why you are missing data as it has been dropped out of the stream)
  2. The latest module updates to using a t3a.small as default → the AMD chipset is ~10% cheaper so worth using over just pure t3 - again though the bigger change is efficiency in how much RAM is actually assigned to the process which gives you much better throughput even on smaller nodes

In terms of back-fill that’s going to be a little bit tricky given you are using the streaming transformer as your main ingestion pipe.

The easiest option is going to be using the “batch transformer” where you will need to stage the window of data you are missing into a specific S3 directory and then use the Spark process to transform it → you can then tell the existing loader application to load this data.

Documentation on this can be found here: https://docs.snowplow.io/docs/pipeline-components-and-applications/loaders-storage-targets/snowplow-rdb-loader/transforming-enriched-data/spark-transformer/

Hey @josh
Thanks for sharing the info above! We’ve decided that the missing data is not particularly important at the moment, hence we’ll just ignore this issue now. Good to learn there’s way to mitigate possible issues in the future though and we’ll just tweak our infra setup slightly for the moment.

No worries @Florin_Finaru ! Good luck and hope the latest tweaks keep it running smoothly.