Hello all,
Goal
I am looking for a script/mostly automated solution that we can use to build a simple GCP pipeline with bigquery as the sink.
Background
We have been trying to build a runnable GCP pipeline for a few weeks now, but are failing with various error messages and problems starting from the enrichment phase. What we have tried so far:
- Quick Start Installation Guide on GCP
- Setup Snowplow Open Source on GCP
- Simo Ahava’s “Install Snowplow On The Google Cloud Platform”
- Various tutorials, blogs etc. on Medium and other websites
I’m not going into too much detail in this thread about the current problems with the above approaches (I’ll probably keep trying to do that in parallel in other threads). We had the most hope in the more modern approach with Terraform in the current Quick Start Guide. Unfortunately, even here the events do not survive the enrichment phase.
This approach (GithHub: etnetera-activate/snowplow-gcp-template) is already very close to the described requirements, but unfortunately it is already two years old and uses very old versions in the enrichment phase. This subsequently requires the use of old distribution images, outdated Java versions, etc. which creates further problems and is certainly not a sustainable approach.
Long Story Short
Does anyone have an currently working and mostly automated solution to set up the described simple pipeline on GCP with Bigquery as sink?
I appreciate any form of help.