The quickstart for GCP works well - events are streaming to the pg database. To continue my exploration of Snowplow as a solution I need to add the BQ streamloader into my quickstart setup (and mutator & repeater). To start, I am trying to deploy the streamloader docker image in GKE via the cloud console and the docs only show how to apply the config via the CLI with a docker run command. I am no docker/gke master and something is definitely missing in my understanding.
This may be obvious to someone more familiar, but how do you apply the HOCON config file when deploying in GKE?
… and if GKE is not the preferred method can someone throw me in the direction of the appropriate way?
One popular way of doing it would be to deploy each application: streamloader, mutator and repeater as a GKE Deployment. For the HOCON and Iglu resolver JSON, you can use another GKE resource, called ConfigMap.
In the Deployment, you specify the name and version of the container to use, as well as the arguments to be passed to the container. You would store the configuration as ConfigMap objects, which you then pass on to the Deployments via the args.
I am basically at the same point. Dilyan, thanks for you direction but I guess I need a bit more context. I was trying to setup a pod for the stream-loader using this config:
which gets me this error message:
[ioapp-compute-0] ERROR com.snowplowanalytics.snowplow.storage.bigquery.streamloader.Main - Usage: snowplow-bigquery-streamloader
I guess it’s easy to resolve but my experience with pod deployment is a bit limited.
From what I can work out, the BigQuery Loader currently forces you to provide a base64 encoded config on the command line. So it is not possible to use the --config=/etc/config/config style of command, and therefore the ConfigMap idea will not work. @dilyan please correct me if I’m wrong about any of that.
So I think your only option is to set your command like this:
I think a more helpful solution would be if BigQuery loader accepted files on the command line. I opened this Github issue to add that feature, but I cannot make any promises on when we will make that change.
@Timo_Dechau , as @istreeter mentioned, you currently need to pass the whole HOCON as a base64-encoded string.
It is still possible to use a ConfigMap for that. You need to ensure you have a record like "config.hocon" = "bXlDb25maWdIb2Nvbg==" and then in the Deployment args you’ll refer to the config.hocon key from the ConfigMap.
This depends on how you create the ConfigMap, but for example with Terraform, you can have a parameterised config_hocon.tpl template file, which you would render with the inputs you provide and place as the value of the ConfigMap’s config.hocon key.
data "template_file" "config" {
template = file("/path/to/config_hocon.tpl")
vars = {
PROJECT_ID = var.project_id
}
}
resource "kubernetes_config_map" "bq_loader_config" {
metadata {
namespace = var.namespace
name = var.name
}
data = {
"config.hocon" = data.template_file.config.rendered
}
}
You can do something similar for the iglu-resolver.json. Then, when you create the Deployment, the args section would look like:
Will try the base64 config first. When I was trying it locally I always got encoding errors with the base64 string. Let’s see how it looks like on the server side.
Were you were passing the base64 files themselves or the contents? It looks like from your yaml you were passing the files. I am pretty sure I had the same errors in my local terminal when I tried passing the files themselves in the docker run command.
It worked when I inserted the file contents as the argument value: --resolver $(cat /snowplow/config/iglu_resolver_b64) --config $(cat /snowplow/config/config_b64)
Notice I made two changes: you don’t need “snowplow-bigquery-streamloader” because this command is run automatically by the docker image. And the config and resolver arguments should be separated strings in an array, not a single string. The same is true if you use the -D syntax.
Sorry for bringing this up again. I had some time now to finally get the collector working and wanted to test the stream loader.
But the service gets: PERMISSION_DENIED: User not authorized to perform this action error and get load the events.
The pod is using the default service account which is the default compute service account. I granted this account pub/sub and bigquery admin access and after nothing was working even tested with owner permissions. Same error.