I’m following Simo’s Ahava guide to setup Snowplow in GCP. The guide has been published 3 years ago and I’m not sure how I should update some command lines.
Right now, I’m trying to setup the vm instance for the ETL process.
According to Simo’s guide I need to run the following command:
It only needs two arguments: --config and --resolver. All the other options, including --runner are inapplicable for streamloader.
They do apply to loader, which is still supported for now. Unlike streamloader, loader is designed to be run as a custom container Google Cloud Dataflow job. So you can’t launch it from a jar file, only by using the official image from Dockerhub.
For more information, check out the setup guide, especially the command line options for StreamLoader and Loader.
You can also find a detailed configuration reference for the HOCON file here.