Ip lookup enrichment error in snowplow opensource

Hi there,

I have followed snowplow documentation and enabled ip lookup enrichment by adding following in main.tf.

locals{
enrichment_ip_lookups = jsonencode(<<EOF
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"geo": {
"database":{
 "enum": ["GeoLite2-City.mmdb" ]
},
"uri": "s3://mybucket/third-party/maxmind"
}
}
}
}
EOF
)
}

# 3. Deploy Enrichment
module "enrich_kinesis" {
  source  = "snowplow-devops/enrich-kinesis-ec2/aws"
  version = "0.2.0"

  name                 = "${var.prefix}-enrich-server"
  vpc_id               = var.vpc_id
  subnet_ids           = var.public_subnet_ids
  in_stream_name       = module.raw_stream.name
  enriched_stream_name = module.enriched_stream.name
  bad_stream_name      = module.bad_1_stream.name
 ssh_key_name     = aws_key_pair.pipeline.key_name
  ssh_ip_allowlist = var.ssh_ip_allowlist

  iam_permissions_boundary = var.iam_permissions_boundary

  telemetry_enabled = var.telemetry_enabled
  user_provided_id  = var.user_provided_id

custom_s3_hosted_assets_bucket_name = "s3://mybucket/third-party/maxmind/"

  # Linking in the custom Iglu Server here
 # custom_iglu_resolvers = local.custom_iglu_resolvers

# Linking in the custom Iglu Server here
  custom_iglu_resolvers = [
    {
      name            = "Iglu Server"
      priority        = 0
      uri             = "http://myhost/api"
      api_key         = var.iglu_super_api_key
      vendor_prefixes = []
    }
  ]

enrichment_ip_lookups = local.enrichment_ip_lookups

When I apply the above changes using terraform it starts loading new sp-enrich-server but I can see following error in cloudwatch logs

{
    "error": "ValidationError",
    "dataReports": [
        {
            "message": "$.parameters.geo.database: does not have a value in the enumeration [GeoLite2-City.mmdb, GeoIP2-City.mmdb]",
            "path": "$.parameters.geo.database",
            "keyword": "enum",
            "targets": [
                "[GeoLite2-City.mmdb, GeoIP2-City.mmdb]"
            ]
        }
    ]
}

All other events also stop because of above error and nothing is logged in database. As I revert the changes it starts working.

I have my bucket and GeoLite2-City.mmdb file.

Let me know what is that I am missing to enable the IP lookup enrichment?

Thanks,

Bhumi

Hi @Bhumi I think there could be two problems here:

  1. Your Enrichment JSON is not correct from what I can tell - you have a nested enum layer that should not be there - it should look like this:
{
  "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
  "data": {
    "name": "ip_lookups",
    "vendor": "com.snowplowanalytics.snowplow",
    "enabled": true,
    "parameters": {
      "geo": {
        "database": "GeoLite2-City.mmdb",
        "uri": "s3://mybucket/third-party/maxmind"
      }
    }
  }
}
  1. In your custom_s3_hosted_assets_bucket_name you just need to provide the name of the bucket not an S3 URI so likely the IAM policy is not going to work as expected to resolve the files once you fix your Enrichment input.

Could you give that a try and see if it fixes it for you?

Hello Joshua,

Thank you for your response. Let me correct and try. Also let me know do I need to set any bucket policy or permissions manually?

Thanks,

Bhumi

Hello Joshua,

That has worked! Enrich server has launched without any error. I tried the same schema earlier but forgot to include bucket name. Thank you for highlighting configuration errors! Much appreciated.

How do I check whether IP enrichment is working or not? As I understand, I should see geo data in atomic.event table?

My application is built using react native. Do I need to use custom context over there?

Hey @Bhumi indeed you should see the fields indicated in this document starting to be populated (IP Lookup enrichment - Snowplow Docs).

It will use the IP Address that the Collector detects when you send an event so you should not need anything else for it to work!

Thanks you for the clarification. My application is using custom schemas. We have separate tables in postgresql that store data collected by tracker. These tables have columns corresponding to custom schema. Is it possible to have ip enrichment details like, geo_city, geo_country, geo_region, geo_zipcode,geo_latitude,geo_longitude, geo_regionname etc in my custom tables? We can modify our custom schema for the above fields if required.

Thanks,

Bhumi

You could do this with a custom enrichment (e.g., the Javascript enrichment) but the best way to do this would be to create a data model that contains this information downstream after Snowplow has inserted the data into Postgres.

1 Like

As @mike said you are best of joining your custom events with the atomic.events table to get the geo information and modeling that information out into derived tables.

Okay. But I am not seeing geo location details for all events in atomic.events. Does this mean I have to make any change on my react native app code? Like invoking custom context or similar?

If the enrichment is enabled all new events should have the geo fields populated. You should not need to make any changes to your application or add custom contexts.