Hi all, first of all, this is a copy of an issue I created on Github.
Stream Enrich is failing to download the Maxmind data sets from s3.
I’ve setup my ip_lookups.json
as per this wiki page like this:
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/1-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"geo": {
"database": "GeoIPCity.dat",
"uri": "s3://my-private-bucket.s3.amazonaws.com"
},
"isp": {
"database": "GeoIPISP.dat",
"uri": "s3://my-private-bucket.s3.amazonaws.com"
},
"organization": {
"database": "GeoIPOrg.dat",
"uri": "s3://my-private-bucket.s3.amazonaws.com"
}
}
}
}
And Stream Enrich throws this error:
[main] ERROR com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp$ - Error downloading s3:/my-private-bucket.s3.amazonaws.com/GeoIPCity.dat: java.lang.IllegalArgumentException: The bucket name parameter must be specified when requesting an object
Exception in thread "main" java.lang.RuntimeException: Attempt to download s3:/my-private-bucket.s3.amazonaws.com/GeoIPCity.dat to ./ip_geo failed
at com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp$$anonfun$12.apply(KinesisEnrichApp.scala:172)
at com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp$$anonfun$12.apply(KinesisEnrichApp.scala:154)
at scala.collection.immutable.List.foreach(List.scala:318)
at com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp$delayedInit$body.apply(KinesisEnrichApp.scala:154)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.App$$anonfun$main$1.apply(App.scala:71)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
at scala.App$class.main(App.scala:71)
at com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp$.main(KinesisEnrichApp.scala:71)
at com.snowplowanalytics.snowplow.enrich.kinesis.KinesisEnrichApp.main(KinesisEnrichApp.scala)
I was told to fix the urls here like this:
"geo": {
"database": "GeoIPCity.dat",
"uri": "s3://my-private-bucket"
},
"isp": {
"database": "GeoIPISP.dat",
"uri": "s3://my-private-bucket"
},
"organization": {
"database": "GeoIPOrg.dat",
"uri": "s3://my-private-bucket"
}
And the process is returning the same error. Any pointers?
Note: I’ve used EC2 roles (on AWS) to grant that instance full access to s3.
Thanks!