Snowplow R89 Plain of Jars released

BenFradet · June 12, 2017, 9:27am

We are pleased to announce the release of Snowplow 89 Plain of Jars:

https://snowplowanalytics.com/blog/2017/06/12/snowplow-r89-plain-of-jars-released

This release ports the Snowplow batch pipeline to Apache Spark, building on our RFC:

http://discourse.snowplow.io/t/migrating-the-snowplow-batch-jobs-from-scalding-to-spark/492

tclass · June 12, 2017, 10:31am

The Bintray link doesn’t work for me http://dl.bintray.com/snowplow/snowplow-generic/snowplow_emr_r89_plain_of_jars.zip

BenFradet · June 12, 2017, 10:44am

Our continuous delivery failed us this morning, we’re currently rebuilding the artifacts one by one, I’ll post here once they’re all up, sorry about the inconvenience.

BenFradet · June 12, 2017, 11:30am

Everything is up! Again, apologies for the inconvenience.

mike · June 13, 2017, 5:35am

What a massive release - nice work to everybody who contributed to the Spark port!

tclass · June 14, 2017, 10:30am

Would the change from Scalding to Spark change anything regarding recovering from bad rows?

alex · June 14, 2017, 11:58am

Hi @tclass - are you referring to Hadoop Event Recovery? No, that remains a Scalding-based application, and of course the underlying Snowplow data formats have not changed in this release.

tclass · June 14, 2017, 1:35pm

yes, that’s what I meant, just wanted to make sure, that we don’t lose that feature while upgrading. Thanks

rbolkey · June 19, 2017, 10:19pm

Any recommendations about AWS instance sizes with the new internals? We’ve been using c3.8xlarges, but with Spark being more memory intensive, are the r3s better now? Is the instance storage still a requirement (e.g. c3/r3 vs c4/r4)?

alex · June 20, 2017, 7:39am

Hi @rbolkey - the c3.8xlarges should be fine, but let us know how you go.

You don’t need instance storage, but you will need to attach EBS if you don’t have instance storage (c4/r4), because we are still using the HDFS on the cluster. We’ll remove that usage of HDFS in a future release.

Topic		Replies	Views
Migrating the Snowplow batch jobs from Scalding to Spark RFCs	18	6945	April 26, 2023
Error when running snowplow spark event recovery 0.1.0 on EMR Troubleshooting	2	1445	April 20, 2021
Snowplow 81 Kangaroo Island Emu released [IMPORTANT UPGRADE for anyone running RT pipeline at scale] New releases	0	1077	June 16, 2016
Snowplow Scala Analytics SDK 0.4.2 released New releases	0	1052	August 6, 2019
Snowplow dbt Attribution v0.4.0 released New releases	0	47	October 15, 2024

Snowplow R89 Plain of Jars released

Related topics