A maintenance dependency-update release.
Updating
Core
Core module includes all the recovery logic and is used to build individual job runners. This version includes binary-incompatible changes.
Google Dataflow
Update Dataflow docker image version to: snowplow/snowplow-event-recovery:0.5.0
Amazon EMR
Updating EMR requires much more changes:
EMR AMI version update
In dataflow-runner cluster config json update ami version to 6.5.0
Java version update
Upload following bootstrap script into an S3 bucket accessible through EMR:
#!/bin/bash
set -e
cat <<_EOF_> /home/hadoop/secondstage.sh
#!/bin/bash
while true; do
NODEPROVISIONSTATE=\`sed -n '/localInstance [{]/,/[}]/{
/nodeProvisionCheckinRecord [{]/,/[}]/ {
/status: / { p }
/[}]/a
}
/[}]/a
}' /emr/instance-controller/lib/info/job-flow-state.txt | awk ' { print \$2 }'\`
if [ "\$NODEPROVISIONSTATE" == "SUCCESSFUL" ];
then
sleep 10
echo "Running my post provision bootstrap post Hadoop software install"
sudo yum install java-11-amazon-corretto
update-alternatives --set java /usr/lib/jvm/java-11-amazon-corretto.x86_64/bin/java
exit;
fi
sleep 10;
done
_EOF_
sudo bash /home/hadoop/secondstage.sh > /home/hadoop/secondstage.sh.log 2>&1 &
exit 0
Add following value into bootstrapActionConfigs
array:
{
"name": "Use Java 11",
"scriptBootstrapAction": {
"path": "s3://$BUCKET/$BOOTSTRAP",
},
"args": []
}
Set java flags through spark-defaults
In dataflow-runner cluster config add configurations
array element (or merge into existing spark-defaults
classification):
{
"classification": "spark-defaults",
"properties": {
"spark.driver.defaultJavaOptions": "-XX:OnOutOfMemoryError='kill -9 %p' -XX:MaxHeapFreeRatio=70",
"spark.executor.defaultJavaOptions": "-verbose:gc -Xlog:gc*::time -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p' -XX:MaxHeapFreeRatio=70 -XX:+IgnoreUnrecognizedVMOptions"
},
"configurations": []
}
Full changelog
- Bump cats to 2.7.0 (#112)
- Bump circe-optics to 0.14.1 (#109)
- Bump circe to 0.14.1 (#108)
- Bump cats-effect to 2.5.3 (#107)
- Bump monocle-macro to 2.1.0 (#106)
- Bump cats-core to 2.6.1 (#105)
- Bump snowplow-badrows to 2.1.1 (#104)
- Bump iglu-scala-client to 1.1.1 (#103)
- Bump spark to 3.1.2 (#102)
- Bump jackson-databind to 2.10.5.1 (#101)
- Bump slf4j to 1.7.36 (#100)
- Bump beam-runners-google-cloud-dataflow-java to 2.36.0 (#99)
- Bump scio to 0.11.5 (#98)
- Add tag for beam sink (#115)
- Add detailed check for integration spec (#110)
- Change Docker base image to eclipse-temurin:11-jre-focal (#93)