Beam-enrich container keeps failing on Kubernetes cluster

brajjany · September 22, 2020, 1:02pm

Hi,

So I have managed to fully orchestrate all relevant docker containers for a native GCP setup onto a Kubernetes cluster, where both the BQ loader and the beam enrich each fire up a dataflow job. All containers are run using a deployment.yaml each.

After firing up the containers, the beam-enrich workload keeps sending out errors and failing the deployment (see picture below)

The logs show this

Why are they keep failing? According to the logs, it seems like an log messages are classified as “errors” even though they only are “info”. Can this be the source of the problem? This is quite bothersome if you would change this to an --update job instead. Would it mean that the deployment would fail all the time hence updating the dataflow job even though nothing new has been added?

Any advice on this? Thanks!

BenB · September 23, 2020, 7:46pm

Hi @brajjany,

All the INFO logs are normal and correspond to Beam building the graph for the Dataflow job.

The details about the error are on the last line: DataflowJobAlreadyExistsException: there is already an active job named beam-enrich. It seems that you’re trying to create another instance Beam Enrich job whereas there is already one running. Beam Enrich uses Dataflow autoscaling to automatically scale up or down, and there should always be only one instance of the Dataflow job running.

brajjany · September 24, 2020, 10:01am

Hi @BenB,

Thanks for the response.

If the INFO logs are normal, why are they labelled as ERRORs? Is this just a GUI bug in the LogViewer tool?

The very same deployment manages to fire up a dataflow job, but what’s happening on the Kubernetes deployment is that it keeps restarting despite successfully submitting. This is the reason I get the error you mention above. It does not however explain why the deployment keeps restarting despite success the first attempt. If I would scroll back to the initial log messages, it shows that a job was successfully submitted from the deployment.

I understand why the deployment keeps restarting (deployments in Kubernetes have restartPolicy set to always by default and cannot be changed). What I don’t understand why a problem is persisted even though a dataflow job was successfully submitted. Here is the beginning of the logging. As you can see the job was successfully sent but attempts to run the pod again.

Looking at the bq-loader, it is totally fine and does not restart after dataflow submission

From kubernetes.io

brajjany · October 12, 2020, 6:51am

bump

anton · October 13, 2020, 4:08am

Hi @brajjany,

Any chance you have an output from Beam Enrich itself? Not sure where would it print its output, but i’m wondering if it’s hitting its own error that triggers the restart.

brajjany · October 13, 2020, 6:39am

Hi @anton, thanks for replying.

This is the default logs of the beam-enrich dataflow job

I also checked if there were any error or warning logs generated but the job itself was successfully generated.

The first time the microservice is run, it successfully creates a Dataflow job, however the microservice restarts itself every time causing the “already exists” exceptions.

One thing I think is weird is why the beam enrich container fails but the bq-loader one works.

The only difference I see is that I have to run beam-enrich as a privileged user. If I don’t, then the job will not detect permissions to access the stagingLocation bucket. This does not happen to the bq-loader case.

Does this have to do with that bq-loader and beam enrich has been differently implemented by different people?

Brian

brajjany · October 26, 2020, 1:45pm

bump

Topic		Replies	Views
Running beam-enrich and bq-loader/mutators on Kubernetes GCP pipeline	0	1675	August 14, 2020
Beam Enrich failing in GCP Dataflow with java.lang.NullPointerException Enrichment	12	1329	July 7, 2020
Beam enrichment fails when Dataflow is back online Enrichment	3	1009	February 27, 2019
About restarting VM instance group GCP pipeline	0	928	October 16, 2021
Enrich pubsub and bq streamloader down often GCP pipeline	3	2365	November 7, 2021

Beam-enrich container keeps failing on Kubernetes cluster

Related topics