Hi there,
I’m having issues to use the ip lookup enrichment. I already configure the enrichment json file:
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"geo": {
"database": "GeoIPCity.dat",
"uri": "http://my-endpoint.s3.amazonaws.com/third-party/maxmind"
},
"isp": {
"database": "GeoIPISP.dat",
"uri": "http://my-endpoint.s3.amazonaws.com/third-party/maxmind"
},
"organization": {
"database": "GeoIPOrg.dat",
"uri": "http://my-endpoint.s3.amazonaws.com/third-party/maxmind"
}
}
}
}
But it does not work, the error:
{
"line": "xxx",
"errors": [
{
"level": "error",
"message": "Could not extract geo-location from IP address [179.74.112.99]: [java.lang.ArrayIndexOutOfBoundsException]"
}
],
"failure_tstamp": "2016-12-20T14:31:03.962Z"
}
Also I changed the content type of the *.dat files to binary/octet-stream, but the error still happening.
The error begins when I add the isp and/or organization. It works when only geo is enabled.
The files are ok, I already validate them using the GeoIP gem .
What is wrong?
Thanks very much
Hi @alex
Yes, my files are public. How can I host them privately and use with snowplow?
alex
December 21, 2016, 8:25am
4
Use a “s3://” path on a bucket that is not publically viewable (but is accessible by the user running EMR).
I changed to “s3://” and removed the “everybody” read permission, but only the “geo” works. ISP or Organization the same error:
Could not extract geo-location from IP address [179.74.112.99]: [java.lang.ArrayIndexOutOfBoundsException]
alex
December 27, 2016, 8:11am
9
Can you re-post your full updated configuration?
@alex sure
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": false,
"parameters": {
"geo": {
"database": "GeoIPCity.dat",
"uri": "s3://myendpoint.s3.amazonaws.com/third-party/maxmind"
},
"isp": {
"database": "GeoIPISP.dat",
"uri": "s3://myendpoint.s3.amazonaws.com/third-party/maxmind"
},
"organization": {
"database": "GeoIPOrg.dat",
"uri": "s3://myendpoint.s3.amazonaws.com/third-party/maxmind"
}
}
}
}
alex
December 27, 2016, 12:37pm
11
Hi @thiagogsr - just use the correct S3 bucket names:
"uri": "s3://mybucket/third-party/maxmind"
Hi @alex , thanks your attention, but it didn’t work yet.
"errors": [
{
"level": "error",
"message": "Could not extract geo-location from IP address [189.9.13.93]: [java.lang.ArrayIndexOutOfBoundsException]"
}
],
My configuration file:
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"geo": {
"database": "GeoIPCity.dat",
"uri": "s3://mybucketname/third-party/maxmind"
},
"isp": {
"database": "GeoIPISP.dat",
"uri": "s3://mybucketname/third-party/maxmind"
},
"organization": {
"database": "GeoIPOrg.dat",
"uri": "s3://mybucketname/third-party/maxmind"
}
}
}
}
alex
December 31, 2016, 9:35am
13
Using aws
CLI can you list the bucket contents please:
$ aws s3 ls s3://mybucketname/third-party/maxmind/
Thanks!
Hi @alex ,
Here the output
2016-12-19 18:24:14 0
2016-12-20 11:28:28 47721533 GeoIPCity.dat
2016-12-20 11:28:36 4189407 GeoIPISP.dat
2016-12-20 11:28:44 20307977 GeoIPOrg.dat
2016-12-20 11:33:10 17760694 GeoLiteCity.dat
I’m using the version 0.9.0 .
@alex is there a way to fix it?
alex
January 15, 2017, 11:50am
17
I’m not sure what’s wrong with your setup @thiagogsr - it still feels like a permissions problem to me.
Are you definitely using the same creds in Stream Enrich and at the command line with aws
CLI?
It is strange because the Geo dat file is in the same path of ISP and Organization dat files and it works. I will check again, thanks.
Having the same issue here using scala stream enrich,
"Could not extract geo-location from IP address [xx.xx.xx.xx]: [java.lang.ArrayIndexOutOfBoundsException: 15796200
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"geo": {
"database": "GeoLiteCity.dat",
"uri": "s3://xxxxxxxxx-snow-plow-assets/third-party/maxmind"
}
}
}
}
I can list from CLI:
2017-01-25 13:30:29 0
2017-01-25 14:04:41 17775436 GeoLiteCity.dat
The user have permissions on S3 however in the IAM Management console in the access advisor tab for that user there seems to be no activity on S3:
Not accessed in the tracking period
Which is weird, I am using stream enrich 0.10.0 any ideas would be welcome.
Thanks,
Nir
I have tried a lot of trouble shooting and finally what helped is downloading the db via
yum install GeoIP GeoIP-data
Then putting the ip_geo file in the enrichment folder, that made the stream enrich to download the file, after that the process ran correctly and successfully finish the job.
I would assume some sort of cache is causing this.
I have opened up a ticket:
https://github.com/snowplow/snowplow/issues/3083
It is working now after download the updated files in maxmind dashboard.
GeoIP-111_20170103
GeoIP-121_20170103
GeoIP-133_20170117
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"geo": {
"database": "GeoIPCity.dat",
"uri": "s3:///mybucket/third-party/maxmind"
},
"isp": {
"database": "GeoIPISP.dat",
"uri": "s3:///mybucket/third-party/maxmind"
},
"organization": {
"database": "GeoIPOrg.dat",
"uri": "s3:///mybucket/third-party/maxmind"
}
}
}
}
1 Like