Redshift Indexing

aditya · September 29, 2019, 7:24pm

Hi,

I am aiming to speed up query performance in Redshift. One way I aim to do that is to do indexing. I wonder what column to pick to do indexing? I read around snowplow’s github issue page and one way to do this in Postgres is to use collector_tstamp and dvce_created_tstamp to index.

Any more suggestion beside those?

EDIT: After further research about redshift, found out from this reading that there is no actual ‘indexing’ in redshift and that setting up DISTKEY and SORTKEY is what considered as setting index. We already have collector_tstamp and event_id used as key. I would like to edit my question as is there any more distkey/sortkey that can be used for indexing?

mike · September 30, 2019, 1:14am

Snowplow sets reasonable sensible defaults for atomic.events and the context tables in Redshift (collector_tstamp / root_tstamp as SORTKEY and root_id / event_id as DISTKEY). You can change these from the defaults but in general you’d only do so if you were optimising for a certain kind of query pattern or patterns that differed dramatically from the Snowplow defaults (e.g., potentially considering a compound sort key with collector_tstamp and app_id for example).

aditya · October 1, 2019, 3:21am

Thanks @mike for the advice. I guess I just want a key that generally can be used to improve overall general performance. We are not yet at the stage to optimise for a particular pattern yet.

mike · October 1, 2019, 3:33am

In that case I would stick with the Snowplow defaults for the moment.

Topic		Replies	Views
Can I change the DISTKEY or SORTKEY of the atomic tables in Redshift? Redshift	3	3131	November 21, 2016
Recommended cluster key for events table in Snowflake? Snowflake	3	2542	July 22, 2020
Redshift maintenance best practices Storage targets	3	5100	April 26, 2017
Making Redshift Use Merge Joins Instead of Slower Hash Joins For engineers	8	2171	April 8, 2024
Snoplow on Postgres - encoded fields, constraints, DISTKEYS and other doubts For engineers	2	778	June 3, 2019

Redshift Indexing

Related topics