[BUG] No bad events loading in DB (Postgres) since updating collector to 2.4.5

I use the Terraform modules for the Open Source Snowplow Pipeline and I recently noticed that none of my bad events were loading into my Postgres Database… they did load into S3 (which is where I ended up going), but I wanted to track down why this was happening.

I hadn’t updated the Postgres loaders, but the EC2 instances had rebuilt in the past week, so it wasn’t a case of needing to restart them. My kinesis bad stream was set to 24 hours data retention, so if it were one bad bad event, that should have cleared.

I found the last bad event had loaded into the database on the 6th Jan 2022, which was just before I updated the Collector to patch the Log4J vulnerability, bringing the collector up to version 2.4.5.

I then saw an error in the logs when forcing a bad event with a CURL.

[ioapp-compute-1] [1;31mERROR[0;39m [36mc.s.s.postgres.streaming.Sink[0;39m - Failed StatementExecution: ERROR: value too long for type character varying(32)

Okay, so I took a look at all the tables in atomic_bad to see what had a 32 character limit. Payload collector, encoding, processor.version, payload raw loader name… I then downloaded the raw event JSON from S3 and ran it through prettier.

Samples from a couple of bad events:
"collector": "snowplow-stream-collector-kinesis-2.4.5-kinesis"
"v_collector": "snowplow-stream-collector-kinesis-2.4.5-kinesis"
"loaderName": "snowplow-stream-collector-kinesis-2.4.5-kinesis",

whereas it used to be ssc-2.2.1-kinesis and ssc-2.3.1-kinesis (taken from the database).

So that’s the culprit - a schema violation. The collector name is now more than 32 characters.

It’s not a huge deal, it just makes it harder to catch schema violations when testing in iOS (which is what I was doing); however it is broken and will be affecting everyone on v2.4.5 of the collector.

So I can personally just create a migration to update the offending rows, but I figured a. someone else might be in the same situation if they keep their modules as obsessively up to date as I do; and b. it should probably get an official bug fix.

Thanks,
Jo

4 Likes

Hi @angelsk, thanks for reporting this problem!

Please can you tell us how you originally created your events table, if you can remember? I’m asking because in the atomic event definition the v_collector has max length 100. And I checked the postgres loader code and it also seems to expect max length 100.

Having said that, I think it is unexpected that v_collector has changed so much between versions 2.3.x and 2.4.x so we should definitely work out what has happened there.

Hi @istreeter,

This isn’t the good events, this is the bad events. If it had been my enriched events not loading it would have defo not taken me 3 weeks to notice :wink:

See badrows/adapter_failures and badrows/schema_violations as examples of the 32 character limitation.

As to how I created the tables? I didn’t. I used the Terraform modules to build the pipeline and initialised it back at the end of Q2 2021, and it did all that for me :slight_smile:

Thanks

3 Likes

Hi @angelsk, ah I see now it is definitely a problem with collector. We have opened this issue over in github. We aim to make a small patch release (version 2.4.6) addressing just this issue, and get it released within the next couple of days.

Thanks again for bringing this to our attention.

2 Likes

I’m not sure it’s a collector issue really. I think it would be good to get the bad_events schema inline with the good events and bring the collector name up to 100 characters to match (with a migration as well). That way if this happens again, then the events won’t stop being ingested.

1 Like

Hey @angelsk ,

We recently published Stream Collector 2.5.0 which includes a fix for this issue.

Thanks again for raising this with us!

1 Like

Thanks @oguzhanunlu, is it updated in the Terraform module too?

The Terraform Module will be updated by the end of the week if all goes to plan :slight_smile:

2 Likes

@PaulBoocock Perfect!

1 Like

@PaulBoocock Any chance there’s an update on this?

Yeah, sorry for the delay. It’s in review https://github.com/snowplow-devops/terraform-aws-collector-kinesis-ec2/pull/20 so hopefully not too far out now, we’ve just been super busy this week.

1 Like

@PaulBoocock Looks good to me :slight_smile: Thanks for the update.

1 Like

Hey @angelsk

We’ve released 0.3.0 of the Collector modules. There’s no breaking changes so you should just be able to bump your collector modules to 0.3.0 in your Terraform.

Release info is here: Updated Terraform Modules for Collector 2.5.0

1 Like