We’re pleased to announce release 0.9.0 of Snowplow Snowflake Loader with modernization and performance improvements for Transformer.
Snowflake Transformer 0.9.0 now fully supports latest EMR AMI (6.4.0) and uses all its performance and observability improvements. Also for better performance characteristics you don’t need a custom comitter and output format (spark.hadoop.mapreduce.job.outputformat.class
setting in playbook).
By default Transformer 0.9.0 uses EMRFS S3-optimized committer (default since 6.4.0), but you also have an option to use community committers:
In order to use them, you need to provide --s3a
option to Transformer and configure the Spark step (or EMR cluster) with options listed in above docs. Our benchmarks show though that EMRFS S3 committer provides the best performance, though all options are more performant than 0.4.3 with custom committer.