Excluding field from atomic.events

Hey all,

GDPR is coming and I want to try and minimize PII data especially ip address, however I do want to keep the location data such as country city

Is there a way to exclude a field from entering the shredded JSON before it loads into redshift?

Thanks

Nir Sivan

1 Like

Hi @NirSivan,

Is the IP anonymisation enrichment a good solution for you? It doesn’t get rid of the events in S3, but blanks out the IP address before loading to Redshift.

Best,

1 Like

Further to @Colm’s note - if you want to anonymize any other field in your Snowplow data, then stay tuned for our next release (R100), which introduces a “PII Enrichment” which lets you do precisely that…

3 Likes

To add some more detail on my previous answer:

IP anonymisation is available, and further PII anonymisation is scheduled for R100 as Alex points out. Those two take care of the Enriched Events on S3 and the data in Redshift.

However, the raw collector logs in S3 remain to be dealt with - the best solution at the moment is to set up lifecycle rules to handle this. Please note that deleting this information from the raw logs makes it impossible to reprocess the data, so it’s best to have a buffer period before you delete anything from Raw, in case there’s a pipeline failure.

A solution that has worked in the past is to set up a lifecycle rule to delete files from the s3 buckets after 1 week, for example.

I hope this is helpful.

2 Likes

Thanks for the assistance, much appreciated!

Hi @NirSivan,

A quick follow-up on Alex’s note:

Further to @Colm’s note - if you want to anonymize any other field in your Snowplow data, then stay tuned for our next release (R100), which introduces a “PII Enrichment” which lets you do precisely that…

This has now been released - you can see the details here.

Best,

1 Like