If this release introduces pseudonymisation using hashing, does anonymisation use two way encryption?
I would have thought it would be the other way around - pseudonymisation uses encryption (so that the original information can be re-extracted and used if necessary) and anonymisation would use one-way hashing algorithms like SHA-256 (where it’s impossible to get the original data back unless you already have it)
Hi @jrpeck1989 both encryption and hashing are substituting a value with an alias (pseudonym). In the case of hashing you could either hash all values (if that is possible) and find out what the original value was, or you could build a lookup table with the hashed values (as we are doing in a subsequent release, but we are also adding salt. The lookup table will be secured). The point is that accidental and casual use of data subject’s PII is averted, but it is not impossible with sufficient resources and internal knowledge to recover at least some information. To me true anonymisation would be to each PII value with a random value or downsampling sufficiently (e.g. 192.168.255.1 -> 192.168.x.x or “Jim Beam” -> “J B”), and that happens before that information hits any permanent storage although I cannot imagine how you would be able to do that on a per data subject basis. At least that is my understanding of the two terms. I am happy to be told otherwise. What are your thoughts?
@jrpeck1989 As of r100 the value is just hashed. It is not randomised, meaning it is not substituted with a random value. Each value is then replaced with it’s hash. The original value is not kept in the enrichment, but could possibly be retrieved from raw logs if those logs are not discarded.
In a later release, there will be the option (which will need to be enabled) to keep the mapping of the original value to its hash, but that would be kept separate from the rest of the data as good practice would advise that this information which constitutes PII of the data subject, should only be used with due justification and when consent is given by the data subject. That feature will be in an upcoming release.
@jrpeck1989 No worries. I just wanted to make sure I did not mislead anyone The original value is sent from the tracker base64 encoded and hashing takes place in the first actual piece that contains any logic about the content (as opposed to handling its transmission). That is where decoding takes place and hashing of sent values, or values that come from other enrichments (e.g. you could hash the location if you are using GeoIp lookup enrichment).
As a firm outside of the EU and a site not targeting users in the EU, are there ways to apply pseudonymization or other GDPR features only to users who are based in the EU?
I’m thinking something along the lines of…
…with the Geolocation enrichment, there is an approximation of the country a user is in, IF the visitor is in one of the 28 member states THEN apply certain rules.