Dealing with technical and economic limitations

filipgerat · October 5, 2017, 1:24pm

Hello snowplowers,
I would like to ask the following questions:

What is the byte size of one atomic event record in the Redshift database (with all enrichments)?
What is the most rows that you have seen in atomic.events working properly?
How do you deal with the situation, when you can’t afford to keep all atomic data in Redshift?
What’s the maximum amount of events per day, that you have seen running properly? How many collectors were in that setup and how often was the emr-elt-runner running?

Why I ask is because I would like to store 40M+ events a day. I just finished the testing setup and the atomic.events table had 24649 rows and 526mbytes, and after one more test run it has 48949 rows and 528mbytes, which then results (with a +1MB for safety) in 3 ÷ (48949 − 24649) ~= 0.1kB per event, which would translate to ~4GB a day, ~ < 1.5TB a year, which is kinda OK.

Thank you for your help!
Cheers,
Filip

acgray · October 6, 2017, 10:10am

In response to your 3rd point, one direction worth exploring is using Redshift Spectrum. You can keep your raw event data in S3 and query it ad hoc when needed, avoiding the (much higher) Redshift storage costs, and instead keeping modelled data in Redshift for ‘hot’ access.

Topic		Replies	Views
How many events per day is maintainable for atomic.events? Storage targets	2	1518	May 25, 2016
Customizing our Snowplow event representation in Redshift Redshift	9	2454	September 26, 2016
Redshift maintenance best practices Storage targets	3	5097	April 26, 2017
Unstructured Events and the events table Data store sources	5	2080	March 7, 2019
Choice of Redshift instances Redshift	4	5436	July 6, 2016

Dealing with technical and economic limitations

Related topics