Filtering out bot traffic from specific user agent

We integrated with an observability service and need to filter out the traffic based on user agent. Do we need to create our own JS enricher or can we edit the user_agent_utils_config to filter out events from these bots

user_agent_utils just extracts information from useragents, it doesn’t filter traffic.

If you need to force events to fail validation via useragent then yes at present the JS enrichment can be used.

However I don’t think you’d normally need to - if the service is just sending pings to the collector, and not valid Snowplow events, then that will fail validation regardless, since it won’t conform to the tracker protocol. If it pings the collector’s /health endpoint, it will also be filtered out.

Thanks @Colm for your reply. As this service triggers all the actions on our website to check the system health so I can see events in atomic passing the enricher.

So the only way left is to have a JS enricher which will force fail the validation. Is there any documentation on how to create one?

https://docs.snowplowanalytics.com/docs/enriching-your-data/available-enrichments/custom-javascript-enrichment/

If you don’t want to receive these events at all in the Snowplow pipeline you should be able to set up a rule at the load balancer that returns a fixed response based on the UA header rather than forwarding on the the collector instances.

1 Like

I like that solution from Mike - probably the best option available.

If for some reason you do want those events to be received successfully at the collector, but to fail validation, there’s a hacky way to go about it that would be more efficient than the JS enrichment.

You could detect the useragent client-side, and use the Javascript tracker’s global contexts ( snowplow('addGlobalContexts', [array of global contexts]) ) to add a valid context to all events, which doesn’t have any schema in iglu. If this fires before all other tracking methods, all events will fail validation with a ‘schema not found’ error.

Something like:

snowplow('addGlobalContexts', [{"schema": "iglu:com.acme/deliberateFail/jsonschema/1-0-0", "data": {"note": "Event came from automated process and should always fail validation" }])