IP Lookup enrichment for quick start GCP

Hello,

Happy newbie snowplow user.
Now that the pipe is running smoothly, I want to upgrade it with the IP Lookup enrichment.
Before doing it trough terraform, I have several questions :

  1. If I understand the flow, I need to add the enrichment in the main.tf file and execute a new plan → apply, exact?

  2. What would happen to the running server (sp-enrich-server)? Is it replaced by terraform?
    (update: Ok, the template is replaced as I see from the new plan)

  3. Can I loose data during the process as data is still ingested?

  4. The quick start run the 2.0.5 version correct? Let’s say I would upgrade to 3.5.0, changing the start script (aka the “snowplow/snowplow-enrich-pubsub:3.5.0” line) in the instance template could do the job? Or do I need to upload the docker image somewhere? Is it recommended? What is the best way to do that? Copy the template → modify → replace it in the group?

Thanks!

Hey @dsolito;

If I understand the flow, I need to add the enrichment in the main.tf file and execute a new plan → apply, exact?

Yep that’s exactly right - however you will need to first upload the database files to somewhere accessible in Google Cloud Storage for your Enrichment server to download it from.

So your configuration may look something like this:

{
  "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
  "data": {
    "name": "ip_lookups",
    "vendor": "com.snowplowanalytics.snowplow",
    "enabled": true,
    "parameters": {
      "geo": {
        "database": "GeoLite2-City.mmdb",
        "uri": "gs://< your bucket here >/third-party/com.maxmind"
      }
    }
  }
}

The Enrich Server already has permission to view objects in Google Cloud Storage so as long as its in the same project it should be able to access and download the uploaded mmdb database file.

What would happen to the running server (sp-enrich-server)? Is it replaced by terraform?
(update: Ok, the template is replaced as I see from the new plan)

It is auto-replaced yes by any big change like this.

Can I loose data during the process as data is still ingested?

Nope - the service in question follows at-least-once semantics. So when it gets terminated any in-flight messages should be processed and then its cleanly shutdown. If it fails non-gracefully then you might end up with a few duplicates (but should never lose any data).

The quick start run the 2.0.5 version correct? Let’s say I would upgrade to 3.5.0, changing the start script (aka the “snowplow/snowplow-enrich-pubsub:3.5.0” line) in the instance template could do the job? Or do I need to upload the docker image somewhere? Is it recommended? What is the best way to do that? Copy the template → modify → replace it in the group

This is more on myself and the team to keep the quick-start modules up to date!

The latest module version is already using v3 however (terraform-google-enrich-pubsub-ce/main.tf at main · snowplow-devops/terraform-google-enrich-pubsub-ce · GitHub). So would recommend you to just update your module version for enrich to the latest available first while we get up to date again on versions so you can cleanly update to 3.5.x.

1 Like

Hello @josh ,
Thank you for your kind reply. Let’s try it.
I see for V3… Then is the tag in the template on gcp correct? (app 2-0-5) ?

So here you are using module_version: 0.1.2 as per the tag above - the latest module version is 0.1.4 which has the updated app_version of 3.0.3.

So in your Terraform you need to update the module_version for enrich to 0.1.4 to use the updated application version.

Ok. The git repo is not updated, reason why :smile:

https://github.com/snowplow/quickstart-examples/tree/main/terraform/gcp/pipeline/default

Update: Enrichment successful
(after a mistake form my side, “GeoLite2-City.mmdb” was also in the “uri” :sweat_smile:).
I had no more events. When enrichment fails, no more data is loaded ?
(some events I did on the website are now missing)

Thanks for your support!

1 Like

If the enrichment cannot be loaded properly enrich never boots and no events are processed. When it does start working properly it should be picked up as expected from where you last successfully processed (there should be no data loss).

1 Like