Event Recovery 0.5.0

A maintenance dependency-update release.

Updating

Core

Core module includes all the recovery logic and is used to build individual job runners. This version includes :warning: binary-incompatible changes.

Google Dataflow

Update Dataflow docker image version to: snowplow/snowplow-event-recovery:0.5.0

Amazon EMR

Updating EMR requires much more changes:

EMR AMI version update

In dataflow-runner cluster config json update ami version to 6.5.0

Java version update

Upload following bootstrap script into an S3 bucket accessible through EMR:

#!/bin/bash
set -e

cat <<_EOF_> /home/hadoop/secondstage.sh
#!/bin/bash
while true; do
NODEPROVISIONSTATE=\`sed -n '/localInstance [{]/,/[}]/{
/nodeProvisionCheckinRecord [{]/,/[}]/ {
/status: / { p }
/[}]/a
}
/[}]/a
}' /emr/instance-controller/lib/info/job-flow-state.txt | awk ' { print \$2 }'\`
if [ "\$NODEPROVISIONSTATE" == "SUCCESSFUL" ];
then
  sleep 10
  echo "Running my post provision bootstrap post Hadoop software install"
  sudo yum install java-11-amazon-corretto
  update-alternatives --set java /usr/lib/jvm/java-11-amazon-corretto.x86_64/bin/java
  exit;
fi
sleep 10;
done
_EOF_
sudo bash /home/hadoop/secondstage.sh > /home/hadoop/secondstage.sh.log 2>&1  &
exit 0

Add following value into bootstrapActionConfigs array:

{
  "name": "Use Java 11",
  "scriptBootstrapAction": {
    "path": "s3://$BUCKET/$BOOTSTRAP",
  },
  "args": []
}

Set java flags through spark-defaults

In dataflow-runner cluster config add configurations array element (or merge into existing spark-defaults classification):

{
  "classification": "spark-defaults",
  "properties": {
    "spark.driver.defaultJavaOptions": "-XX:OnOutOfMemoryError='kill -9 %p' -XX:MaxHeapFreeRatio=70",
    "spark.executor.defaultJavaOptions": "-verbose:gc -Xlog:gc*::time -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p' -XX:MaxHeapFreeRatio=70 -XX:+IgnoreUnrecognizedVMOptions"
  },
  "configurations": []
}

Full changelog

  • Bump cats to 2.7.0 (#112)
  • Bump circe-optics to 0.14.1 (#109)
  • Bump circe to 0.14.1 (#108)
  • Bump cats-effect to 2.5.3 (#107)
  • Bump monocle-macro to 2.1.0 (#106)
  • Bump cats-core to 2.6.1 (#105)
  • Bump snowplow-badrows to 2.1.1 (#104)
  • Bump iglu-scala-client to 1.1.1 (#103)
  • Bump spark to 3.1.2 (#102)
  • Bump jackson-databind to 2.10.5.1 (#101)
  • Bump slf4j to 1.7.36 (#100)
  • Bump beam-runners-google-cloud-dataflow-java to 2.36.0 (#99)
  • Bump scio to 0.11.5 (#98)
  • Add tag for beam sink (#115)
  • Add detailed check for integration spec (#110)
  • Change Docker base image to eclipse-temurin:11-jre-focal (#93)
2 Likes