So I have managed to fully orchestrate all relevant docker containers for a native GCP setup onto a Kubernetes cluster, where both the BQ loader and the beam enrich each fire up a dataflow job. All containers are run using a deployment.yaml each.
After firing up the containers, the beam-enrich workload keeps sending out errors and failing the deployment (see picture below)
Why are they keep failing? According to the logs, it seems like an log messages are classified as “errors” even though they only are “info”. Can this be the source of the problem? This is quite bothersome if you would change this to an --update job instead. Would it mean that the deployment would fail all the time hence updating the dataflow job even though nothing new has been added?
All the INFO logs are normal and correspond to Beam building the graph for the Dataflow job.
The details about the error are on the last line: DataflowJobAlreadyExistsException: there is already an active job named beam-enrich. It seems that you’re trying to create another instance Beam Enrich job whereas there is already one running. Beam Enrich uses Dataflow autoscaling to automatically scale up or down, and there should always be only one instance of the Dataflow job running.
If the INFO logs are normal, why are they labelled as ERRORs? Is this just a GUI bug in the LogViewer tool?
The very same deployment manages to fire up a dataflow job, but what’s happening on the Kubernetes deployment is that it keeps restarting despite successfully submitting. This is the reason I get the error you mention above. It does not however explain why the deployment keeps restarting despite success the first attempt. If I would scroll back to the initial log messages, it shows that a job was successfully submitted from the deployment.
I understand why the deployment keeps restarting (deployments in Kubernetes have restartPolicy set to always by default and cannot be changed). What I don’t understand why a problem is persisted even though a dataflow job was successfully submitted. Here is the beginning of the logging. As you can see the job was successfully sent but attempts to run the pod again.
I also checked if there were any error or warning logs generated but the job itself was successfully generated.
The first time the microservice is run, it successfully creates a Dataflow job, however the microservice restarts itself every time causing the “already exists” exceptions.
One thing I think is weird is why the beam enrich container fails but the bq-loader one works.
The only difference I see is that I have to run beam-enrich as a privileged user. If I don’t, then the job will not detect permissions to access the stagingLocation bucket. This does not happen to the bq-loader case.
Does this have to do with that bq-loader and beam enrich has been differently implemented by different people?