Is the IP anonymisation enrichment a good solution for you? It doesn’t get rid of the events in S3, but blanks out the IP address before loading to Redshift.
Further to @Colm’s note - if you want to anonymize any other field in your Snowplow data, then stay tuned for our next release (R100), which introduces a “PII Enrichment” which lets you do precisely that…
IP anonymisation is available, and further PII anonymisation is scheduled for R100 as Alex points out. Those two take care of the Enriched Events on S3 and the data in Redshift.
However, the raw collector logs in S3 remain to be dealt with - the best solution at the moment is to set up lifecycle rules to handle this. Please note that deleting this information from the raw logs makes it impossible to reprocess the data, so it’s best to have a buffer period before you delete anything from Raw, in case there’s a pipeline failure.
A solution that has worked in the past is to set up a lifecycle rule to delete files from the s3 buckets after 1 week, for example.
Further to @Colm’s note - if you want to anonymize any other field in your Snowplow data, then stay tuned for our next release (R100), which introduces a “PII Enrichment” which lets you do precisely that…