I’m trying to build a table in Redshift that for each purchase event on our website, we have the 5 previous visits, and the associated referral information , so we can get some idea of the most common referral/visit paths and how they related to purchase behaviour. Ideally the table would look something like this:
user_id | visit_index | marketing_channel | purchase_tf
-------------------------------------------------------
1 | 2 | organic_search | FALSE
-------------------------------------------------------
1 | 3 | direct | FALSE
-------------------------------------------------------
1 | 4 | organic_search | FALSE
-------------------------------------------------------
1 | 5 | paid_search | FALSE
-------------------------------------------------------
1 | 6 | direct | TRUE
-------------------------------------------------------
2 | 8 | social_media | FALSE
...
and so on.
I’m trying to do this in Redshift, but I’m struggling to conceptually decide how I could build out a table to do this in SQL. We have all the session information in our derived sessions table (referrer, device, authenticated user ID) and they also have transaction info in them.
I can see there needs to be some use of window functions, with the partition bounded by the previous 5 rows and the current row, but I struggling to create the upper bound (the transacting session) and order the rows inside the window.
Has anyone tried performing this type of analysis using SQL before? It’d be interested in hearing your thoughts on how this might be done.
Thanks!