Can't query events table

RockstarAlex · March 9, 2022, 9:28am

Hello everyone!

We use snowplow for analytics in our company and we have faced with a problem of querying events table. We deployed snowplow in AWS using terraform. Also we use metabase as UI tool for analysis.

During the work we have uploaded a lot of schemas and the events table has grown up a lot. And it has become so big that metabase is not able to upload the table.

Also I am not able to query the table from the Postgres DB. It returns me Killed.

So my question: is there any ways to optimize the events table or querying it. Maybe I can split the table somehow, What can be solutions for occurred problem?

Thank you!

Colm · March 9, 2022, 10:33am

So, the events table is basically a master historical record of every event you’ve ever tracked - which isn’t well suited to frequent access or analytics workloads. Also by nature just a list of events isn’t a great starting point for analysis since generally analysis would like to start with some business context (which is achieved with simple aggregations) - for example page ping events alone don’t tell you anything, but if aggregated up to page view level, they expose a lot of rich information upon which a lot of interesting analysis is based.

For that reason, the recommended way to interact with the data for analysis is to set up a data modeling step of your pipeline, which aggregates the raw events to something more intelligible from an analysis point of view. Typically the output tables of that modeling process are what you would layer metabase or similar tools over.

We have a bunch of standard data models, which you’ll find somewhere in the docs I linked above. I recommend starting with the dbt ones - they’re most accessible starting point.

Hope that helps!

RockstarAlex · March 9, 2022, 2:17pm

Thank you for your reply.

Actually, we can’t afford any data modeling for now. dpt is not suitable for us, and I don’t quite understand how sql-runner works. Does it create a new table in my PostgresDB? Because this is what we actually need. And where do I need to run it, locally?

Maybe there are any other ways to do what we want.

Thank you

kfitzpatrick · March 9, 2022, 2:52pm

sql-runner will run locally and use a combination of yaml and sql files. The yaml organises the sql into steps. Once fully configured and you have the Snowplow playbooks in the right steps the final output will be views or tables (up to you) covering aggregations of User, Sessions, page views which is an extremely powerful starting point to build on. Thats being said the runner is more configuration heavy, dbt is way more accessible. Both dbt or SQL runner approach will give you what you need here. You can then build on the models with joins, for example a structured events view (if you used them) or with you own custom contexts/unstruct views.

Topic		Replies	Views
Customizing our Snowplow event representation in Redshift Redshift	9	2454	September 26, 2016
PostgresDB: No data getting stored Troubleshooting	3	1406	August 29, 2022
Tutorial ideas for the upcoming SQL Query Enrichment? Enrichment	3	2936	September 5, 2016
Using different target database schema for different set of events Collectors	1	430	December 20, 2023
Data modeling in real time	8	1448	September 4, 2023

Can't query events table

Related topics