IP Lookup enrichment for quick start GCP

dsolito · November 1, 2022, 12:24pm

Hello,

Happy newbie snowplow user.
Now that the pipe is running smoothly, I want to upgrade it with the IP Lookup enrichment.
Before doing it trough terraform, I have several questions :

If I understand the flow, I need to add the enrichment in the main.tf file and execute a new plan → apply, exact?
What would happen to the running server (sp-enrich-server)? Is it replaced by terraform?
(update: Ok, the template is replaced as I see from the new plan)
Can I loose data during the process as data is still ingested?
The quick start run the 2.0.5 version correct? Let’s say I would upgrade to 3.5.0, changing the start script (aka the “snowplow/snowplow-enrich-pubsub:3.5.0” line) in the instance template could do the job? Or do I need to upload the docker image somewhere? Is it recommended? What is the best way to do that? Copy the template → modify → replace it in the group?

Thanks!

josh · November 1, 2022, 10:05pm

Hey @dsolito;

If I understand the flow, I need to add the enrichment in the main.tf file and execute a new plan → apply, exact?

Yep that’s exactly right - however you will need to first upload the database files to somewhere accessible in Google Cloud Storage for your Enrichment server to download it from.

So your configuration may look something like this:

{
  "schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
  "data": {
    "name": "ip_lookups",
    "vendor": "com.snowplowanalytics.snowplow",
    "enabled": true,
    "parameters": {
      "geo": {
        "database": "GeoLite2-City.mmdb",
        "uri": "gs://< your bucket here >/third-party/com.maxmind"
      }
    }
  }
}

The Enrich Server already has permission to view objects in Google Cloud Storage so as long as its in the same project it should be able to access and download the uploaded mmdb database file.

What would happen to the running server (sp-enrich-server)? Is it replaced by terraform?
(update: Ok, the template is replaced as I see from the new plan)

It is auto-replaced yes by any big change like this.

Can I loose data during the process as data is still ingested?

Nope - the service in question follows at-least-once semantics. So when it gets terminated any in-flight messages should be processed and then its cleanly shutdown. If it fails non-gracefully then you might end up with a few duplicates (but should never lose any data).

The quick start run the 2.0.5 version correct? Let’s say I would upgrade to 3.5.0, changing the start script (aka the “snowplow/snowplow-enrich-pubsub:3.5.0” line) in the instance template could do the job? Or do I need to upload the docker image somewhere? Is it recommended? What is the best way to do that? Copy the template → modify → replace it in the group

This is more on myself and the team to keep the quick-start modules up to date!

The latest module version is already using v3 however (https://github.com/snowplow-devops/terraform-google-enrich-pubsub-ce/blob/main/main.tf#L6). So would recommend you to just update your module version for enrich to the latest available first while we get up to date again on versions so you can cleanly update to 3.5.x.

dsolito · November 1, 2022, 10:36pm

Hello @josh ,
Thank you for your kind reply. Let’s try it.
I see for V3… Then is the tag in the template on gcp correct? (app 2-0-5) ?

josh · November 1, 2022, 10:52pm

So here you are using module_version: 0.1.2 as per the tag above - the latest module version is 0.1.4 which has the updated app_version of 3.0.3.

So in your Terraform you need to update the module_version for enrich to 0.1.4 to use the updated application version.

dsolito · November 1, 2022, 11:02pm

Ok. The git repo is not updated, reason why

https://github.com/snowplow/quickstart-examples/tree/main/terraform/gcp/pipeline/default

dsolito · November 1, 2022, 11:54pm

Update: Enrichment successful
(after a mistake form my side, “GeoLite2-City.mmdb” was also in the “uri” ).
I had no more events. When enrichment fails, no more data is loaded ?
(some events I did on the website are now missing)

Thanks for your support!

josh · November 2, 2022, 5:31am

If the enrichment cannot be loaded properly enrich never boots and no events are processed. When it does start working properly it should be picked up as expected from where you last successfully processed (there should be no data loss).

Topic		Replies	Views
Enrichments, how to enable in quickstart examples? Enrichment	13	1326	June 29, 2022
Ip lookup enrichment error in snowplow opensource Enrichment	9	993	August 22, 2022
Snowplow Open Source add Enrichment Enrichment	5	585	December 18, 2023
Question about ip_lookups enrichment and storage of MaxMind DB in our S3 Enrichment	6	1187	January 26, 2022
Ip_lookup enrichment in Snowlplow mini Enrichment	3	1791	January 25, 2022

IP Lookup enrichment for quick start GCP

Related topics