Snowplow JS Authentication

Hello everyone!

I am writing to discuss our ongoing implementation of Snowplow and seek further clarification on a specific aspect related to the authentication of front-end calls using JavaScript.

Firstly, I would like to express our appreciation for the progress we have made so far with Snowplow. It appears to be a promising solution that aligns well with our needs. While we are not yet in production, we have encountered a point that requires a more in-depth understanding.

Our concern revolves around the authentication of front-end calls made through JavaScript. We would greatly appreciate it if you could provide us with further insights into this matter. Specifically, we would like to understand how the validation of sent requests is performed and any recommended best practices for ensuring the integrity and security of these requests.

Our utmost concern is to ensure the integrity and security of our Snowplow implementation. It is crucial for us to prevent any malicious agent from inspecting the page, capturing calls, or gaining unauthorized access to the collector URL. We aim to avoid situations where events are sent in random patterns, potentially generating a significant amount of false data or even causing disruptions to our infrastructure. Therefore, we kindly request your guidance and any recommended best practices to fortify our defenses against such risks. Your expertise in this matter would be invaluable in safeguarding our system and maintaining its smooth operation.


Hi there Guillherme. This is an interesting subject but I think we need to understand the threat model you’re dealing with in a bit more detail.

Snowplow—and other behavioural data sources—ultimately source their behavioural events from clients that we do not control. In the case of web, the code that generates the events (and the rest of the application) is available for viewing by anyone who wants to learn how it works. Anyone can work out where data is being sent.

Any approach to making this somehow “secure” is going to rely on obfuscation—making it difficult to work out what’s going on, but not impossible to a determined adversary.

One approach might be using server-side calls, which means you can be sure it was generated by your servers. But then how does the server know the client asked for something? By responding to a call from the client, which has the same flaws.

So perhaps if you could explain a little more the threats as you see them.

Well, on our side, what we are considering as the impact of a possible malicious agent inspecting the page and getting the url of the collector, as well as the payload sent are the following:

  • Sending non-compliant data (false data, dirtying the database)

  • Possibility of collector/loader crash due to sending multiple requests - How to block attacks, avoid a lot of repeated requests

Example, by inspecting this page I was able to get the collector URL and also the payload sent, I can start to send a lot of requests, generating some fake data mass and possibly causing a disruption to the collector

Hi @ggasque,

regarding your “non-compliant data” point: In my opinion, if you leverage schemas the right way and only send events based on custom schemas, the risk of non-compliant data is low because, your custom schemas are not exposed in the client, meaning a spammer cannot know how events are validate in enrich.

  • regarding collector/loader crash point: the easiest way the mitigate this is, to route the tracker endpoint via a CDN/WAF like cloudflare, akamai, fastly. Additional benefit if setup properly: cookies set via Snowplow endpoint (sp cookie) are protected against Safari ITP.

Hope that helps.

1 Like

Just to add to David’s answer and list a couple of other options:

In case you are dealing with authenticated users, there is also the option to send auth tokens to the collector as a context entity and validate the tokens using the JS Enrichment or the API Enrichment. This would also enable you to enrich the events with additional user data based on the auth token.

Another option is to implement the reCaptcha API to validate data attached to each event as proposed in this reCaptcha v3 enrichment RFC.


Regarding the topic of non-compliant data, it will not be the biggest problem, considering that we have the auth implemented.

About to route the tracker endpoint via a CDN/WAF, I gonna deep explore this possibility.


We are not dealing only with authenticated users, but maybe (as we are using AWS) it could be used SigV4 signing to generate the auth token.

I will consider about implement the reCaptcha API

While I was exploring some possible solutions regarding ways to authenticate, a possibility rises:
I’m thinking about performing AWS SigV4 signing on the client-side, because Kinesis, by default, accepts unsigned requests. To enforce SigV4 authentication, it can enable enhanced fan-out on the Kinesis stream, enhanced fan-out requires signed requests.

You can certainly sign requests client side but irrespective of signing method they depend on having a secret to sign the message with. Signing is probably likely to reduce users sending targeted data but if you have a signing method that is executing client side then it necessitates having that secret available on the client. If an attacker is determined enough they can determine the secret and signing method and still send dummy data. As far as I’m aware there aren’t any analytics tools (or many other tools for that matter) that prevent request tampering. Data that is sent from the client is default assumed to be untrusted so folks that want to prevent tampering tend to move these events server side rather than relying on code that executes on the client.

If you do come up with a way that you think prevents this I’d love to hear about it as it’s certainly something we could consider implementing.

Nice POV, for sure if I figure out something to handle this I’ll share with the community