No bad data in S3

Hello,

I have deployed Snowplow using Terraform on AWS. In Kinesis I can see 4 streams: raw, enriched, bad-1, bad-2. However, in the S3 bucket there is only one folder transformed only with good folder inside. So bad data isn’t written anywhere.
What could be a problem here?

Hey @cealkate the easiest way to check if you don’t just have only good data is to create some bad data to validate things!

The simplest “bad” payload is simply:

curl -XGET <collector_endpoint>/i --output -

This sends an invalid (as empty) payload to the Collector which will land in the bad queue.

You then want to check if a bad folder appears where you have configured the S3 Loader for bad data to send information.

If nothing appears within generally 10 minutes you want to check out the logs for that application and validate that it is indeed working.


Did you use the quick-start to spin up the pipeline?

After sending a bad event, can not see any bad folder in S3. The logs show the following


Record is written only to raw stream, not the bad one.

Yes, used quick-start repository as basis for the deployment.

Hey @cealkate and have you checked the applications logs for the “bad” S3 Loader?

P.S. You can also use this tool to send a few good & bad events: Tracking your first events | Snowplow Documentation. (Note: it only works with https:// collector URLs, due to the “mixed content” blocking in the browser.)

can’t see that loader in CloudWatch logs, here are the only ones available
Screenshot 2023-10-13 at 17.08.20

Hi @cealkate and you have enabled this service in the vars file?

Hello,
Cannot find this variable s3_bad_enabled anywhere in our Terraform modules. Shouldn’t it then be enabled by default?

So to reiterate here - there won’t be any “bad” data in S3 unless that loader has been deployed. It looks increasingly like that module has not been deployed as it would otherwise have an entry in CloudWatch logs.

Did you fork / customize the quickstart at all to your own purposes which could be why this loader is not present?

If you had followed the default pipeline setup it is indeed deployed as default so without removing it or disabling it using the options I linked it should be deployed.

1 Like