Hi,
I am using the snowplow batch pipeline and loading events into a Redshift database.
I have defined a custom unstructured event type (my_event_type) and a custom context (my_context) which is attached to this custom unstructured event type.
Lately I have found that some events are correctly loaded into the atomic.events table and the atomic.my_context table but not into the atomic.my_event_type table.
I checked that using the following query which should not return any row but does
SELECT e.event_name, et.root_id
FROM atomic.events e
INNER JOIN atomic.my_context c ON c.root_id = e.event_id
LEFT JOIN atomic.my_event_type et ON et.root_id = e.event_id
where e.event_name = 'my_event_type' and et.root_id is null;
The problem seems to be during the rdb_loader step since the custom event type is correctly processed up until this step (I can see that some my_event_type events are in the shredded/archive bucket but not in the atomic.my_event_type table).
The Load AWS Redshift enriched events storage Storage Target
logs don’t show anything suspicious:
RDB Loader successfully completed following steps: [Discover, Load, Analyze]
I
I have another environment running with the same configuration and that doesn’t have this issue (the above query doesn’t return any row). The only difference is the EMR instance type and the fact that the environment with the issue has the PII enrichment enabled.
Any idea on how to debug this issue?
Thanks,
Arthur