alex
June 25, 2017, 5:20pm
5
Hi @mjensen - check out this documentation:
The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP - snowplow/snowplow
And this tutorial:
Our new Hadoop Event Recovery project (documentation ) lets you fix up Snowplow bad rows and make them ready for reprocessing, by writing your own custom JavaScript to execute on each bad row.
While this is a powerful tool, using it can be quite involved. This tutorial walks you through one common use case for event recovery: where some of your events failed validation because you forgot to upload a particular schema. Let’s get started.
Refresher on Snowplow bad rows
Snowplow bad rows look like…
Because you are only using GETs, you won’t encounter the duplication problem that can occur when recovering bad events from POST payloads (see the caveats section ) in the documentation.