Shredded type not loading into Redshift


I am using the snowplow batch pipeline and loading events into a Redshift database.

I have defined a custom unstructured event type (my_event_type) and a custom context (my_context) which is attached to this custom unstructured event type.

Lately I have found that some events are correctly loaded into the table and the atomic.my_context table but not into the atomic.my_event_type table.

I checked that using the following query which should not return any row but does

  SELECT e.event_name, et.root_id
  FROM e
  INNER JOIN atomic.my_context c ON c.root_id = e.event_id
  LEFT JOIN atomic.my_event_type et ON et.root_id = e.event_id
  where e.event_name = 'my_event_type' and et.root_id is null;

The problem seems to be during the rdb_loader step since the custom event type is correctly processed up until this step (I can see that some my_event_type events are in the shredded/archive bucket but not in the atomic.my_event_type table).

The Load AWS Redshift enriched events storage Storage Target logs don’t show anything suspicious:

RDB Loader successfully completed following steps: [Discover, Load, Analyze]

I have another environment running with the same configuration and that doesn’t have this issue (the above query doesn’t return any row). The only difference is the EMR instance type and the fact that the environment with the issue has the PII enrichment enabled.

Any idea on how to debug this issue?


I tried running the COPY FROM command manually using a file in the archive/shredded bucket containing a shredded type that was not correctly loaded and it worked.