I’m seeing some domain_userids shared by different user_ids in my DB. (one domain_userid is even shared by 150+ user_ids.)
As I looked for other posts, it appears to me that domain_userid is the first party cookie id and as UUID, and it is already the best candidate for identifying unique users. In my understanding(correct me if i’m wrong), it has also something to do with a customer’s device? if that’s the case, how can a device shared by so many different users?
Moreover, they all seem to be in different cities, and with very dynamic customer data. Some of the users were created in 2015 and 2016 and still active in 2021. I was able to link some users by their IP addresses, but it still doesn’t make up for 150+ users.
Since I’m working on anomaly detection, I know this is quite unusual and these cases are the outliers in my analysis. However, I’d still like to know what could be the reasons behind this. Is it possible that it can be stolen or misused somehow?
I’ve seen this happen in a bunch of circumstances and as you’ve mentioned it’s usually something weird going on that often involves diving a bit deeper into the data.
Occasionally domain_userids can of course collide - though it’s unusual and doesn’t explain what you are seeing here.
corporate proxies / VPNs / internal load balancers. Sometimes these can do weird things with header / cookie caching that can make the cookie stick across traffic that is proxied through a network.
Something else to check… user_id is a field which you set with the tracker, this isn’t a Snowplow controlled field but a completely customizable one. I’d check your tracker implementation to ensure there isn’t an issue with how you are setting the user_id field for your users.
I suggest this as the example you give sort of suggests that a single user (because of a single domain_userid which is usually quite stable) is getting multiple user_id values. Could they be logging in and out a lot, or could it be a bot using lots of log ins? A couple of things to ponder there too.