How to add custom business logic into Snowplow enrichment process?

davewwright · April 29, 2016, 5:11pm

Dumb question, but I’ve been through as much of the documentation as I can find. I have custom events stored in S3, and want to batch process them once an hour with additional validation rules and enrichment data mappings. For example, take the contents of the URL referrer field, if known translate it to X otherwise translate it to Y. Where and how do I program the Scalding to define this as a map reduce function? I’ve found the EmrETLRunner config file, but am not seeing where the actual business logic resides.

Apologies in advance for the newbie question…

alex · April 30, 2016, 12:20am

Hi @davewwright,

Forking the Scalding code isn’t recommended. Have you looked at the JavaScript Script Enrichment? It’s designed for this use case.

davewwright · April 30, 2016, 2:57pm

Yes, I saw that but it was not clear that this is the primary extension mechanism. So the Hadoop parallelism will be by event, and I should just do a call out to an external service that does the data translation of various parameters? No data look ups in Hadoop this way, correct?

alex · May 2, 2016, 11:37am

Correct - the Hadoop parallelism is by event. We are working on adding support for you writing a custom enrichment as a packaged JVM jar (so you could write it in Java or Scala), but in the meantime, yes the JavaScript enrichment is the way to go.

If you’d rather not put the logic inside the JavaScript enrichment, in R79 you’ll be able to integrate an external service holding the logic, using the API Request Enrichment.

davewwright · May 2, 2016, 12:41pm

Excellent, thanks for the help. Is it safe to assume R79 with the API enrichment will be available in May?

alex · May 2, 2016, 6:43pm

Hi @davewwright - yes it will be available within a week or so. It is undergoing final testing now…

desper4do · January 6, 2017, 2:55pm

Hello there. Is it possible to do HTTP requests within this enrichment process?

alex · January 6, 2017, 3:06pm

Yes indeed it is! Here is the documentation:

desper4do · January 6, 2017, 3:11pm

@alex, I was asking is it possible to do that in Rhino Javascript enricher Sorry, I shouldn’t use ‘enrichment’, I was talking about this particular javascript enricher.

desper4do · January 6, 2017, 3:13pm

We want to apply our logic with javascript enricher, and if something goes wrong we need to be able to log this action by sending HTTP call.

alex · January 6, 2017, 3:15pm

I am pretty sure it’s possible to make an HTTP call from inside the JavaScript Enrichment - but if you can, it would be cleaner to handle the error in-band, just returning an error context which will be attached to the event for further processing downstream…

It means you can run and rerun the Snowplow enrichment process without causing side effects in other systems (in functional programming terms, pure versus impure function).

desper4do · January 6, 2017, 3:31pm

The whole our idea is to avoid any errors if possible and make sure it goes to the end of pipeline. We plan to achieve this goal by just normalizing incoming data (like if we got string field, but we expect integer here, we just ‘fix it here’, convert into right type and log it (We keep logs in ElasticSearch BTW). By analyzing logs we can fix problems on our code. There could be many situations (especially in early stage of developing our analytics) when just adding new field for self-describing events and contexts could lead whole data moved to bad bucket which is really not good for us.

Topic		Replies	Views
Custom enrichments and data modelling Enrichment	1	1727	January 9, 2019
API Request Enrichment with AWS Enrichment	1	1069	August 8, 2022
Custom JavaScript Enrichment hosted in S3 Enrichment	3	53	February 27, 2025
Controlling the order enrichments are run Enrichment	6	2716	September 25, 2017
Multiple SQL enrichments depending on event schema Enrichment	2	596	February 29, 2024

How to add custom business logic into Snowplow enrichment process?

Related topics