Different geo_country for same user_ipaddress using ip_lookups enrichment

Hello,

For enrichment geo_country we use GeoIP2 City Database.
There is no understanding between us: is this a normal situation, when we’re getting different geo_country for same user_ipaddress? It happens even between updates in GeoIP2 City Database - so the database is static.

1 Like

This would be highly unusual for the same IP address to resolve to a different country if you aren’t updating the database. What does your enrichment configuration look like?

Hello, mike,

ip_lookups.json.erb looks like:

{
	"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",

	"data": {

		"name": "ip_lookups",
		"vendor": "com.snowplowanalytics.snowplow",
		"enabled": true,
		"parameters": {
			"geo": {
				"database": "GeoLite2-City.mmdb",
				"uri": "<%=ENV['ENRICHER_PATH_TO_MAXMIND_DB']%>"
			}
		}
	}
}

If the database isn’t changing at all I’m not sure how the same IP address could resolve to two different countries to be honest. Enricher uses a library that contains a LRU cache but this is keyed on IP address so shouldn’t really make a difference either way.

@ppustoshnyi are you describing the data in atomic.events, or a derived table, eg. the output of a data model? (or somewhere else?)

Hello, Colm,

I am describing atomic.events. I tried queries like these:

select *
from
(
    select user_ipaddress, count(distinct geo_country) as countries
    from atomic.events
    where collector_tstamp > '2023-08-21'
    group by 1 having countries > 1
)
order by 2 desc limit 10;

select *
from
(
    select user_ipaddress, count(distinct geo_country) as countries
    from atomic.events
    where etl_tstamp > '2023-08-21'
    group by 1 having countries > 1
)
order by 2 desc limit 10;

They gave me not empty results - ips with more then one countries.

As I know, previous update of GeoIP2 City Database was 2023-08-20

In that case I’m afraid I’m where Mike is on this one. Not quite sure how that would happen

As I know, previous update of GeoIP2 City Database was 2023-08-20

If the database was updated on the 20th, and you have this happening on the 21st, then that update would be my first candidate for investigation. If something changed in the database, it is possible that cached values were used for a period, and then new values were found.

It is also potentially possible that caching isn’t responsible, but timezone difference accounts for the confusion on each day.

I would start by running the same queries for today or yesterday. If the above is an explanation then I would expect not to see the issue present more recently.

Problem is solved.
We forgot to add parameter assetsUpdatePeriod to enrich configuration, so enrichers didn’t check database updates.
After adding this parameter geo_country’s amount of duplicates has been reduced

1 Like