Limiting querystring in URL for Snowplow js tracker 2.15+

Hi all,

Is there any way to limit the querystring in the URLs being sent to Snowplow?

I found out that you can set the the URL into something custom, i.e snowplow(‘setCustomUrl’, ‘https://sdasd.com’), but is there a simpler way to do this? This gives me the impression that this has to be done to all pages where page view events are invoked.

Hi @brajjany,

You could grab the url with some javascript and strip the querystring out, setting that value using setCustomUrl.

However, it’s an unusual thing to need to do - the enrichment process will parse the url into its individual components, so if you need to query the page url or path (or a combination of them), you can do so using the canonical fields described here.

Is there a specific use case in mind whereby you need to strip the querystring prior to sending the data through?

Thanks for the reply @Colm :slight_smile:

This is the use case: After a form submit the user gets redirected to a destination page with some user-specific information generated. This data can be stored in the querystring of the URL for which can be personal-sensitive. We would therefore like to track events on this specific page but leave out the querystring.

I’m not sure how to solve this with v2 of the JS Tracker, without calling setCustomUrl, but with v3 you could create a plugin.

That plugin could hook into the beforeTrack and cleanse the url field of any query parameters.

You’d want to call payloadBuilder.getPayload() and read the url property, then write it back once you’ve cleansed it with payloadBuilder.add('url', xxxx).

You can read about creating your own plugins here. You can also find a migration guide for v2 to v3 here.

.

1 Like

If you have potentially personal data in the URL I’d strongly encourage you as much as possible to push for its removal rather than having your analytics tools overwrite the URL.

Any third party Javascript you are running on site may be deliberately (or accidentally) capturing this information and it being sent to other tools (analytics, real user monitoring, crash reporting, ad tools, attribution tools etc) and it may also be getting captured in server logs as well. I know it’s not related to your original question but a number of data breaches (Lufthansa et al) have happened due to this.

2 Likes