I have the bq stream loader setup on appengine flexible. The current scaling metric is cpu utilization which is set at 10% (the bqloader doesnt scale enough with other values).
Performed a load test on the pipeline with 8k requests/second for 90mins. The result was that the bq loader scaled to 163 instances(max set to 200) but there was a huge backlog and it only kept increasing until the load test came to an end. Once the test was over the backlog started decreasing though.
The docker file for bq stream loader is:
FROM openjdk:18-alpine
COPY snowplow-bigquery-streamloader-1.1.0.jar snowplow-bigquery-streamloader-1.1.0.jar
COPY config.hocon config.hocon
COPY resolver.json resolver.json
COPY script.sh script.sh
RUN apk add jq
CMD sh script.sh
The script.sh contents are:
jq '.data.repositories[0].connection.http.uri=env.SCHEMA_BUCKET' resolver.json >> tmp.json && mv tmp.json resolver.json
java -jar snowplow-bigquery-streamloader-1.1.0.jar --config $(cat config.hocon | base64 -w 0) --resolver $(cat resolver.json | base64 -w 0)
AppEngine service config being:
runtime: custom
api_version: '1.0'
env: flexible
threadsafe: true
env_variables: ...
cool_down_period: 120s
min_num_instances: 2
max_num_instances: 200
target_utilization: 0.1
network: ...
initial_delay_sec: 300
check_interval_sec: 30
timeout_sec: 4
failure_threshold: 4
success_threshold: 2
check_interval_sec: 5
timeout_sec: 4
failure_threshold: 2
success_threshold: 2
app_start_timeout_sec: 300
service_account: ...
There are errors for the bq stream loader:
Is there a way to have the bqloader handle the load more efficiently without having the backlog increase while under load.
Could you help with this?