Backlog for BQ Stream Loader on AppEngine

siv · January 27, 2022, 4:53am

Hey,

I have the bq stream loader setup on appengine flexible. The current scaling metric is cpu utilization which is set at 10% (the bqloader doesnt scale enough with other values).

Performed a load test on the pipeline with 8k requests/second for 90mins. The result was that the bq loader scaled to 163 instances(max set to 200) but there was a huge backlog and it only kept increasing until the load test came to an end. Once the test was over the backlog started decreasing though.

Backlog:

The docker file for bq stream loader is:

FROM openjdk:18-alpine

COPY snowplow-bigquery-streamloader-1.1.0.jar snowplow-bigquery-streamloader-1.1.0.jar
COPY config.hocon config.hocon
COPY resolver.json resolver.json
COPY script.sh script.sh

RUN apk add jq

CMD sh script.sh

The script.sh contents are:

jq '.data.repositories[0].connection.http.uri=env.SCHEMA_BUCKET' resolver.json >> tmp.json && mv tmp.json resolver.json
java -jar snowplow-bigquery-streamloader-1.1.0.jar --config $(cat config.hocon | base64 -w 0) --resolver $(cat resolver.json | base64 -w 0)

AppEngine service config being:

runtime: custom
api_version: '1.0'
env: flexible
threadsafe: true
env_variables: ...
automatic_scaling:
  cool_down_period: 120s
  min_num_instances: 2
  max_num_instances: 200
  cpu_utilization:
    target_utilization: 0.1
network: ...
liveness_check:
  initial_delay_sec: 300
  check_interval_sec: 30
  timeout_sec: 4
  failure_threshold: 4
  success_threshold: 2
readiness_check:
  check_interval_sec: 5
  timeout_sec: 4
  failure_threshold: 2
  success_threshold: 2
  app_start_timeout_sec: 300
service_account: ...

There are errors for the bq stream loader:

Is there a way to have the bqloader handle the load more efficiently without having the backlog increase while under load.

Could you help with this?

mike · January 27, 2022, 6:51am

Have you figured out where the bottleneck is (e.g., processing, acknowledgement, sinking etc?). 163 is a huge number of instances for that amount of data so I wouldn’t be wildly surprised if you start to see instances trip over each other and start to potentially hit some quotas as PubSub figures out how to distribute messages between each subscriber client for a single subscription.

The deadline_exceeded suggests that something is slowing down in your processing between receiving the event and failing to ack it within the timeout window.

siv · January 27, 2022, 9:51am

Hi @mike,

Thank you for the response.

I checked the quota limits for pub sub and haven’t come close to crossing them. The limits for BQ are also good.

So i tried updating the resources for the instances, set each instance to use 1 core and 1.6gb of ram and it worked perfectly!

Thank you!

Topic		Replies	Views
BQ Stream Loader on GCP AppEngine GCP pipeline	2	1003	March 29, 2022
BigQuery Stream Loader becomes very non-performant after processing large numbers of events GCP pipeline	4	1347	January 27, 2023
BigQuery StreamLoader on Container-Optimized OS Engine GCP pipeline	0	1010	November 5, 2020
Trouble adding the BQ streamloader to the GCP quickstart implementation GCP pipeline	14	1975	June 20, 2022
Enrich Pub Sub on App Engine Scaling Issue GCP pipeline	2	1001	January 21, 2022

Backlog for BQ Stream Loader on AppEngine

Related topics