Snowplow upgrade

Srashti · January 18, 2022, 5:35am

Hello team,

we are planning to upgrade snowplow components from dataflow job to app engine in gcp. The upgrade includes moving to latest versions as below
Collector version 2.3.0 to 2.4.1
Enricher from beam enrich 2.0.1 (dataflow job )to enrich pubsub 2.0.3(app engine)
Bqloader from 0.6.4(dataflow) to 1.0.1 (on appnegine)
Gcsloader from 0.3.1 to 0.3.2 (still remains dataflow)
Repeater Mutator to 1.0.1 (from VM to appengine)

So I just want to check if you have any guides or steps/best practices which can be used for upgrade?

istreeter · January 18, 2022, 7:26pm

Hi @Srashti

Collector version 2.3.0 to 2.4.1

I recommend going all the way to version 2.4.5, which fixes a few bugs and fixes security vulnerabilities compared to 2.4.1. In most cases the upgrade from 2.3.0 is very easy; there is nothing you need to change in your configuration. But if you terminate SSL at the collector then it’s a little bit more complicated because the SSL configuration changed, as described here in the docs.

Enricher from beam enrich 2.0.1 (dataflow job )to enrich pubsub 2.0.3(app engine)

The latest version is 2.0.5. It’s good that you are moving to enrich-pubsub, because the dataflow version will soon be deprecated. Our docs site has plenty of information on how to run enrich-pubsub. Compared to the dataflow version, it has a different command line and config file.

Bqloader from 0.6.4(dataflow) to 1.0.1 (on appnegine)
Repeater Mutator to 1.0.1 (from VM to appengine)

The latest version is 1.1.0. To help you upgrade from 0.6.4 you could check out this upgrade guide on our docs site or this discourse announcement.

Gcsloader from 0.3.1 to 0.3.2 (still remains dataflow)

This upgrade should be simple. The command line and configuration options remain the same.

Good luck! You should be able to upgrade in any order, because all applications are compatible with each other.

Srashti · January 19, 2022, 5:45am

Thank you for your reply.
We have our snowplow pipeline on GCP. Currently we are using dataflow for enricher , bqloader and mutator repeater we are running it as jar on a compute machine.

For the upgrade we are moving from dataflow to app engine. So there will be an appengine service for enricher, bqloader and mutator repeater.

So do you have any guidelines for the steps to follow for migration from dataflow to appengine for all components.

mike · January 19, 2022, 6:52am

I don’t believe Appengine is officially supported infrastructure but if you were to head down this path I’d opt for the Appengine Flex runtime with a Dockerfile for each of these components - which would not be wildly dissimilar to containerising it on Kubernetes / individual virtual machines.

Topic		Replies	Views
Help with GCP pipeline upgrade GCP pipeline	0	934	December 2, 2021
Enrichment upgrade issues and other choices comparison GCP pipeline	4	1960	October 24, 2021
Components are being removed from GCP? For engineers	3	704	June 20, 2019
Enrich 2.0.0 released! New releases	0	1601	May 6, 2021
Beam Enrich failing in GCP Dataflow with java.lang.NullPointerException Enrichment	12	1331	July 7, 2020

Snowplow upgrade

Related topics