Hey,
I have the bq stream loader setup on appengine flexible. The current scaling metric is cpu utilization which is set at 10% (the bqloader doesnt scale enough with other values).
Performed a load test on the pipeline with 8k requests/second for 90mins. The result was that the bq loader scaled to 163 instances(max set to 200) but there was a huge backlog and it only kept increasing until the load test came to an end. Once the test was over the backlog started decreasing though.
Backlog:
The docker file for bq stream loader is:
FROM openjdk:18-alpine
COPY snowplow-bigquery-streamloader-1.1.0.jar snowplow-bigquery-streamloader-1.1.0.jar
COPY config.hocon config.hocon
COPY resolver.json resolver.json
COPY script.sh script.sh
RUN apk add jq
CMD sh script.sh
The script.sh contents are:
jq '.data.repositories[0].connection.http.uri=env.SCHEMA_BUCKET' resolver.json >> tmp.json && mv tmp.json resolver.json
java -jar snowplow-bigquery-streamloader-1.1.0.jar --config $(cat config.hocon | base64 -w 0) --resolver $(cat resolver.json | base64 -w 0)
AppEngine service config being:
runtime: custom
api_version: '1.0'
env: flexible
threadsafe: true
env_variables: ...
automatic_scaling:
cool_down_period: 120s
min_num_instances: 2
max_num_instances: 200
cpu_utilization:
target_utilization: 0.1
network: ...
liveness_check:
initial_delay_sec: 300
check_interval_sec: 30
timeout_sec: 4
failure_threshold: 4
success_threshold: 2
readiness_check:
check_interval_sec: 5
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
app_start_timeout_sec: 300
service_account: ...
There are errors for the bq stream loader:
Is there a way to have the bqloader handle the load more efficiently without having the backlog increase while under load.
Could you help with this?