I’ve been dealing with some different issues related to the different types of IDs - network, domain, session, user, etc. I believe I understand the purpose and logic of each, but it seems like each applies in a different case.
I see in this link here reference to working from a table of user_ids to build out a marketing touch model. For my example, I’m interested in plotting a UX path through our app.
Regardless!
Would it be considered best practice to always work from a master user-concordance sort of table, that matches up every ID value with every other? i.e. for a totally invented example, the single user_id (our customer id), with the two different domain ids we have for that person, and the two different network_ids, and their 6 different session_ids?
domain_userid is a first-party cookie set for web users. It’s pretty reliable but is tied to the domain name of the site on which it is set. If a user also visits another domain, they’ll get a new domain_userid for that activity. So a visitor who visited domainone.com and then domaintwo.com will have two different identifiers here. A COUNT(DISTINCT domain_userid) will come back with a value of 2.
network_userid is (potentially) a third-party cookie set by the collector in whatever domain it’s in. So if your collector is collector.domainone.com it’s a first-party cookie for a visitor to domainone.com but a third-party cookie for a visitor to domaintwo.com. In modern privacy-preserving browsers, particularly Safari and Firefox, third-party cookies aren’t reliable. In Chrome a visitor going to domainone.com and domaintwo.com should end up with a common network_userid but for other browsers, all bets are off. You can only use this quite carefully.
domain_sessionid gets a new value for each time a user visits your site.
So if someone is identified on your site, you can now tie that domain_userid to the user. A common way to do this is a lookup table as mentioned. There’s also options for sloppier targeting, like grouping users on the same IP address.
However you might apply different rules in different contexts. I always explain this with a contrived example:
- If you want to target advertising at someone, you can often gain by being a bit sloppy, since your best outcome is a tiny improvement in click-through rates. For some use cases it even makes sense. If I’ve been looking at home insurance products, it’s likely a household decision so advertising it to my partner is probably a good bet since we’re both in the consideration group for the product.
- If you need to target a message more tightly, you might need to be more certain. For example if you want to remind me about my appointment at the sexual health clinic with “Hey Simon, don’t forget to get your STD check tomorrow”, I probably don’t want others who share my IP address to get that message. In fact for a case like this you probably want me to have logged in very recently!
So applying this kind of logic can easily get quite complex. You might decide to use network_userid for Chrome browsers and fall back to domain_userid for unidentified users elsewhere. You might be sloppy and do some clustering around IP addresses, but not cluster when there’s more than a certain threshold of uniques from a particular IP address (shared company IP address example). And for some purposes you might be really specific about how you group users.