Koen87
November 9, 2016, 5:39am
1
Hi guys,
We are running into some weird GoogleBot behaviour. Normally this GoogleBots are identified by Snowplow by for some reason this bot isn’t. We can (of course) run our own User Agent checks and such but was hoping this is something that was already done inside of Snowplow… like it currently is done with the “br_type” or “br_family”.
As you can see in the screenshot, it works sometimes but not always… What is the current logic inside of Snowplow?
See screenshot below:
Hi @Koen87 ,
Thanks for flagging this. We use a third-party library to parse the useragent string, so I’m afraid it’s not within our direct control.
I recommend having a look at this thread too: Excluding bots from queries in Redshift [tutorial]
The regex on the useragent string does catch those exceptions.
Hope this helps,
Christophe
yali
November 9, 2016, 2:38pm
3
It might be that we need to upgrade the version of the useragent parsing libraries we’re using? @alex what’s the easiest way to check this?
2 Likes
Yes, it’s actually vital to keep that library up2date.
Koen87
November 9, 2016, 9:39pm
5
That would be great Yali. Let me know how you go.
alex
November 10, 2016, 2:16pm
6
Hi guys,
Okay so we have created tickets for refreshing both of our current useragent enrichments:
Scala Common Enrich: bump user-agent-utils to 1.20 #2930
Scala Common Enrich: bump ua-parser to latest version #2931
Unfortunately both libraries are problematic:
user-agent-utils
was EOLed 13 days ago (though we still have one upgrade we can do). I have reached out to the author to find out more
uap-java
is not available on Maven Central and is not up-to-date with the latest uap project regexps
So a fair bit of work on our side to get these libraries back on track, but it’s something we will take seriously.
yali
November 29, 2016, 2:47pm
7
A couple of users have recommended we look at WURFL as an alternative (paid for) library for user agent parsing. I’ve created a ticket here:
https://github.com/snowplow/snowplow/issues/2966