Stream Enrich error parsing URL containing pipe | character

Hello!

As a part of investigating differences in data gathered by our tracker and GA, I have managed to read contents of enrichment-bad Kinesis stream (following this example and writing a wrapper around it).

While it doesn’t explain our problem fully, I’ve discovered we’re losing some events due to enrichment error - namely:

Provided URI string [https://www.example.com/en/thing.html#1_Type|9_Number?utm_source=remarketing&utm_medium=feed&utm_campaign=campaign] violates RFC 2396: [Illegal character in fragment at index 45: [https://www.example.com/en/thing.html#1_Type|9_Number?utm_source=remarketing&utm_medium=feed&utm_campaign=campaign]

URL is an example of course. The error is coming from | character in URL. How can I pass URL’s containing these characters through enricher without error?

So netaporter (the URL parsing service) is technically doing the right thing here in rejecting that URL. The pipe character (and any other reserved characters) should be url encoded (e.g., %7C rather than |) as they aren’t allowed characters as part of a URI.

1 Like

You might be interested in this issue which looks to explore relaxing url parsing: https://github.com/snowplow/snowplow/issues/3880.

As Mike mentioned pipe characters are actually illegal url characters so the preferable way to handle this is to not use them in your urls, but for some use cases the option of a more relaxed standard is useful so I believe that’s being explored.

Best,