Dbt snowplow_web custom events filtering

I am testing snowplow_web dbt data model 0.6.0 and considering adding IP address blacklist. Is there any way to add custom event filtering in the model without modifying the source code?

Not really - that said the web dbt models are designed as a bit of a base starting point, so feel free to incorporate your own logic in them or use downstream models to perform that filtering!

1 Like

Thanks for you answer @mike! I am willing to create custom filtering step in the model, should I consider to make pull request as well? Maybe you have suggestion which approach could be optimal?

I don’t work on the web models closely but @Emiel probably has some views on this re: pull requests.

If you are primarily filtering just on IP address I’d considering maintaining a table of IP addresses / and or CIDRs - this would make it flexible enough for others to incorporate without hardcoding any address into the dbt model itself. I’d then consider adding a variable that would allow you to point / enable / disable this feature e.g.,

snowplow__ip_filter = ip_filter_list_reference

Thanks once again for quick answer!
@Emiel sorry for approaching you in person, but is there a chance to arrange 1-1 conversation with you? I have been using snowplow for 5 years already and snowplow_web dbt package is a pleasant surprise for me! I would really appreciate if you could share your vision of snowplow data model and how it should be developed!

Hi @Gediminas_Pukys
Not sure if you have seen the Oct 2021 Product Office Hours - ‘modeling your snowplow data in dbt’. This might prove useful.
Cheers,
Eddie

Hi @EddieM, thanks for your replay.
It was helpful for deeper understanding at least. However, I still need to find out how to solve my challenge in proper way. I would like to add intermediate step as the following: snowplow_web_base_events_this_run → snowplow_web_base_events_this_run_filtered_ips → snowplow_web_pv_scroll_depth.
I would prefer add it here for performance reasons, but I am open for any solution (and not constructing custom downstream by myself).
Thanks for your valuable help!
Gediminas

Could you achieve this by running a custom module after the base module which modifies the snowplow_web_base_events_this_run table?

Alternatively you could fork the base module but I’d recommend avoiding that if at all possible

Hi @Colm, This is exactly what I want to do! but all references downstream {{ ref(‘snowplow_web_base_events_this_run’) }} points to snowplow_web_base_events_this_run table. How to reroute the reference to snowplow_web_base_events_this_run_filtered_ips without forking base model?

My suggestion is to modify that table without renaming it.

I normally wouldn’t recommend this, but I don’t know if there’s a better option at the moment.

I do recognise that this is a limitation of the models, and it is something we hope to improve in the future - but in the short- to medium- term I’m not sure what will get prioritised, so I think modifying the table in-place is an acceptable option for now. :slight_smile: