Make scala stream enrichments optional

Context

We are using a bunch of enrichments like geo ip lookup, user agent parsing, ip anonymization and also weather enrichment. Some of the enrichments are essential, because there a legal obligation. Or it is just a simple business rule to have certain information attached to an event, otherwise it is not useful. Some other enrichments are purely optional, and we don’t want the event end up in the bad stream, because the enrichment failed. It may be completely different for another user scenario.

Suggestion

Does it make sense to introduce another boolean parameter in each of the enrichment config like optional or any other descriptive name?

  • False: enrichment failed. Not a problem, we pass the event on anyways.
  • True: enrichment failed. The event is broken and we send it to the bad queue

This way companies can assemble their unique combination of enrichments. And they are in full control which failure is severe and which is not.

3 Likes

Hi @christoph-buente! It’s an interesting idea, thanks for the suggestion.

In the case that the enrichment failed, the suggestion is to pass the event on - but would you record the enrichment failure with the event in any way?

Good question. I think it make sense to log it at least. If an enrichment did not do it’s work, you can see it with the event itself, right? So if user agent is not analyzed, then the event would miss the corresponding context.

Well, an enrichment failure and reason for the failure is some pretty interesting metadata, versus “absence of the enrichment output”, which doesn’t even tell us if the enrichment was enabled.

This topic feeds into this more generalized ticket:

https://github.com/snowplow/snowplow/issues/351

I agree. But as there is error message support now, how about generating an additional event in case an enrichment failed and pass the original one on as is?

Is there any plan on supporting this feature in the near future? I’ve seen some similar discussions here (https://github.com/snowplow/snowplow/issues/3485), but it’s been a while since anyone commented anything.

I think a good option would be return a warning message inside the respective derived context if the users decide not to bad stream a given enrichment when it fails. This should be somewhat easy to work upon, even easier to filter in Redshift or other tools, and wouldn’t change data pipeline, empowering users.