How to estimate the EBS storage size needed for EMR process?

rahul · June 1, 2018, 8:37am

Hi,
We are upgrading our snowplow stack to snowplow 89 Plain of Jars. We need to use the EBS storage with the EMR instances.

How do we estimate the EBS storage size needed for our EMR process?

egor · June 4, 2018, 3:49am

Hi @rahul,

Comparing to Hadoop, EBS volume_size can be decreased for Spark since only the output datasets will be written to disk. So, you can take your biggest dataset size + additional 25-50%.

However, I’d recommend you upgrade to R92 or even R97 if you use Clojure collector to get all performance benefits.

rahul · June 4, 2018, 6:02am

@egor Which dataset should we consider? Input dataset or the Output dataset?

Thanks in advance

egor · June 4, 2018, 6:30am

The output one since it will be written to disk. Do note that Spark is memory hungry as opposed to Hadoop and you should allocate enough memory for it (e.g. using memory-optimized instances).

gareth · June 4, 2018, 2:33pm

We believe that 6-7GB will be used on the disk presumably for the OS and managed software. We got caught out creating an 8GB EBS volume to process 500MB of data and it filled the disk although it appears there was only 2GB available for Spark and HDFS to use. Unfortunately I no longer have a record of the Snowplow enriched and shredded step output size. From the EMR monitoring tab it does look like it used ~16GB of the available capacity.

rahul · June 6, 2018, 5:12am

Thanks for the reply @gareth and @egor.

Topic		Replies	Views
Disc usage during EMR stage AWS batch pipeline (Legacy)	2	1712	August 8, 2019
How to attach EBS volumes to EMR with snowplow? AWS batch pipeline (Legacy)	3	1638	December 2, 2016
Learnings from using the new Spark EMR Jobs AWS batch pipeline (Legacy)	8	13562	August 23, 2017
Processing a big file in EMR or split it up? AWS batch pipeline (Legacy)	2	2969	March 17, 2018
Spark memory woes AWS batch pipeline (Legacy)	1	1937	December 14, 2017

How to estimate the EBS storage size needed for EMR process?

Related topics