Snowplow Event Recovery on GCP

Hi everyone,
we want to use the Snowplow Event Recovery on GCP to recover enrichment failures, my question is related to the events recovered: in BigQuery is there a field to recognize the events that are recovered by this process?

Thanks.

Hi @Stefania_Iellamo,

there is no field. You need to compare the load_tstamp columns with collector_tstamp to validate the recovery. Additionally you can see in the recovery job itself how many events where successfully recovered. Unfortunate event recovery for GCP is not properly documented IMO and I found it pretty hard to get it running. I am planning to write a short guide about it.

Hope that helps.
David

Hi @davidher_mann,
thanks for the reply. It helps!

Can I ask you how do you usually manage the recovery? I need to recover enrichment failures and during my tests I see that the files - used as input for the recovery - are not deleted from the “badrows” bucket. In order to not recover them multiple times, do you move them before running the recovery command to another bucket?

Thanks in advance.
Stefania

Hi Stefania,

I think recovering events multiple times is only a real issue if a lot of people doing recoveries on the same pipeline and I would ignore it for now. In general it would be possible to transfer the files in GCP to a dedicated folder, since you can select a bucket/folders of your choice in the inputDirectory parameter. It think its not worth the effort because:

  • If you recover the same events multiple times the duplicates will be filtered out in the data modeling process, since it is normal to have duplicates in the raw table anyway.
  • To run the job you need to specify the folders in the bad events bucket (inputDirectory parameter). Since the buckets are partitioned by year/month/day/hour the start and end date/hour can be specified. Additionally you can include a filter e.g. on collector time stamp (config parameter). Combined with a proper config you can select bad events for your recovery job very specific, which reduces the risk.

You mentioned that you did a test: have you already managed to successfully create a recovery job in dataflow or where are you currently stuck?

Best,
David

Hi David,
thanks for your reply.

I already managed a successfull recovery in test. I’m thinking to the best procedure to recover a lot of failures we had before removing an enrichment that was causing errors and I think in the future we will not need to recover often.

Thanks for your support.
Stefania