IP Geo changing to a different database

Hi All,

Looking for pointers if i can change the IP2Geo database from Maxmind to another provider http://lite.ip2location.com/file-download. They have binary files as well.

I did try it by just replacing the location to point to the new db and emr-etl-enrich job provided the following error. Should the binary file named .dat or is it an format issue?

Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at com.twitter.scalding.Job$.apply(Job.scala:47)
	at com.twitter.scalding.Tool.getJob(Tool.scala:48)
	at com.twitter.scalding.Tool.run(Tool.scala:68)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at com.snowplowanalytics.snowplow.enrich.hadoop.JobRunner$.main(JobRunner.scala:33)
	at com.snowplowanalytics.snowplow.enrich.hadoop.JobRunner.main(JobRunner.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: com.snowplowanalytics.snowplow.enrich.common.FatalEtlError: NonEmptyList(error: NonEmptyList(error: instance value ("IP2LOCATION-LITE-DB11.IPV6.BIN") not found in enum (possible values: ["GeoLiteCity.dat","GeoIPCity.dat"])
    level: "error"
    schema: {"loadingURI":"#","pointer":"/properties/parameters/properties/geo/properties/database"}
    instance: {"pointer":"/parameters/geo/database"}
    domain: "validation"
    keyword: "enum"
    enum: ["GeoLiteCity.dat","GeoIPCity.dat"]
    level: "error"
	at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob$$anonfun$2.apply(EtlJob.scala:140)
	at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob$$anonfun$2.apply(EtlJob.scala:140)
	at scalaz.Validation$class.fold(Validation.scala:64)
	at scalaz.Failure.fold(Validation.scala:330)
	at com.snowplowanalytics.snowplow.enrich.hadoop.EtlJob.<init>(EtlJob.scala:139)
	... 16 more

As far as I can tell, no.

The error above shows that either “GeoLiteCity.dat”,“GeoIPCity.dat” are acceptable values, changing this filename is unlikely to help as the binary format for IP2Location (couldn’t find the specification for this format) is likely different from the Maxmind binary specification and would likely require a different library to read the database.

1 Like

@mike is right - this would need to be a new Snowplow enrichment. I wasn’t able to find a Java SDK in their GitHub repository:

So getting IP2Location to create a Java SDK would be a sensible first step.

1 Like

A quick update - it looks like IP2Location has a Java SDK, but it is non-free:

This is quite an unusual decision by them, given that the IP (intellectual property) of an IP (Internet Protocol) lookup is in the database, not the client code, and that presumably they want wide usage of their service.

However it does look like they have other open-source libraries, albeit GPL-licensed, here:

No java for lite edition. But they offer c, with jndi it can be wired to work with java.

What about extending support for commercial integration? Akamai EdgeScape is an exceptionally good product and integration is easy ( example: https://github.com/dashirov/apex-etl-app/blob/develop/src/main/java/com/iaccap/data/apex/etl/app/AkamaiEdgescapeGeoIPExtractor.java )


IP2Location has open sourced the Java SDK which can be downloaded at https://github.com/ip2location/ip2location-java. Could this be the first step to move forward for integration?

I think this certainly enables the functionality. There’s a fair few databases there though so I think a PR to integrate all of the functionality might be quite large - but in theory code wise should be reasonably simple if written as an enrichment to the current pipeline with it’s own ip2location schemas in Iglu Central.