I am using Snowplow Version 75 . During Enrichment - Huge no of Raw Logs are going into the Bad logs as the Netporter Library is not able to parse the Event : Illegal character in fragment at a particular index. Could you please suggest me ways so that I can somehow prevent the loss of data .
These rows fail because the URI contains an illegal character. The best solution to prevent data loss would be to update the actual URI’s and remove the illegal character.
Hope this helps,
In some situations, fixing the actual URIs is not possible. We have a number of users who track events across their client’s websites rather than their own. (Ad networks are an obvious example.) For these users changing the URI is not practical.
In the mid term our intention is to break out URI parsing into its own enrichment. It will then be possible to configure this to return empty values for e.g. the different
page_ fields, rather than invalidating the event as a whole.
setCustomUrl method documented here.
We’re also working on improving our technology around reprocessing bad rows, to make it straightforward to reprocess a batch of bad rows, and if they are bad because of specific reasons (identified using the error messages included in the bad rows), apply some transformations to the data to address the issue, so they can be safely reprocessed. More details to follow as this is built out.
Good question. This might be a question for @BenFradet or the others on the engineering team?