Snowplow enrichment of SendGrid Webhooks seem to be missing events

Hi!

I have set up Sendgrid to send events of all types to my snowplow collector url (using AWS & the snowflake streaming data pipeline) and am getting data every second into my events table.

However, upon seeing the data itself, I am missing a solid 60%-70% of the events. Sendgrid Webhook documentation specifies that multiple events are sent in the same POST request. I’m receiving around ~1800 posted delivered events per day even though I’m expecting around 280k, which leads me to believe that my collector is reading the first event and not reading the rest of the JSON within the POST request.

How can I configure my collector / enrichment pipeline to iterate through each event in the POST request, or is there a setting in the terraform files that needed to be turned on to allow such usage?

Thanks for reading my question & I’d greatly appreciate any ideas you may have!!

I think we’ve had reports of this but never been able to replicate it or tie it back to actual webhook payloads and find the issue. I haven’t seen it impacting 60-70% of volume before though.

The enricher definitely seems to handle an array with multiple events in the payload. I don’t know if there are duplicate sg_event_id values or some other issue at play.

Sometimes the SendGrid payloads’ size exceeds the collector’s maxBytes settings too, so the requests end up as size_violations bad rows. So adjusting that might be an option if that’s what you’re encountering.

If you’re able to share raw event/request payload (scrubbed of personal data) where you’re only seeing only partial Snowplow events, that would be extremely helpful.