Data Modeling - events_staged table is empty

kuangmichael07 · November 11, 2021, 4:23pm

Hi Snowplowers,
Our pipeline is deployed on GCP and using BQ for db.

I successfully ran data-models-master and was able to generate the page_views/sessions/users derived tables.

However, as I play with the code logic that I deleted the ‘derived’ and ‘scratch’ datasets, then rerun the data modeling again. But not be able to generate any data in the new derived tables.

I double checked with the event_staged table in ‘scratch’ and it was empty. I believe this is why no data is processed for this run.

I tried to modify the start_date in 01-base-main yml but did not have any luck.

Is there a right way that I can get all historical data processed for events_staged table again?
Thank you

Colm · November 11, 2021, 4:52pm

Hey @kuangmichael07,

It’s a complicated model but I’ll give you the easiest unblocking solution first, then I’ll do my best to add explanations that might help you understand the relevant pieces.

The easiest unblock

All of the standard playbook directories have XX-destroy playbooks like this one. Run them all, then run the model from the top again. This will essentially tear everything down and start again.

The general advice I have for modifications or additions to the model is that it supports ‘plugin’ customisation. You can find a guide to that here. Of course you’re free to change whatever you like in the standard model, but do so with the understanding that this is akin to modifying the source code of a tracker - once you’ve forked the logic it becomes hard for us to offer much support if you hit issues.

Most use cases can be done without forking though. For example you can configure the model to skip the update to the derived.page_views table, and instead use your own custom module to create a derived.page_views_custom instead.

Now for a couple of explanations of the detail:

It sounds like the table isn’t updating because despite deleting the derived tables, the manifests probably remained the same - the manifests determine what data is processed into the base module and through the model.

I double checked with the event_staged table in ‘scratch’ and it was empty. I believe this is why no data is processed for this run.

While you’re probably correct, note that the data in the scratch.events_staged normally gets dropped at the end of the model. If you have set cleanup_mode to “debug” or “trace” in the base module’s playbook, it doesn’t get dropped.

kuangmichael07 · November 12, 2021, 12:55am

Hi @Colm ,
Thank you for the quick reply.
I think I might misunderstood the usage of start_date option.
For example, our application has hooked up with Snowplow since Sep 1st this year and I want to run data modeling for all historical data. Should I set it to 2021-09-01? And the result should cover all days from Sep 1st to now?

BTW I selected all metadata from datamodel_meta and seeing quite many rows were processed but no derived result is showing up

Thank you

Colm · November 12, 2021, 10:50am

Yes start date determines what date the model starts with on it’s first run. If it’s not the first run, it’ll use the manifests to determine when it should start. If you ran the model already, then changed the start date, this will do nothing to the next run of the model, the start date is only used when the manifest is empty.

BTW I selected all metadata from datamodel_meta and seeing quite many rows were processed but no derived result is showing up

Well, that’s not surprising - by your own account you ran the model and then subsequently dropped the derived tables.

Like I explained you just need to run the destroy playbooks and re-run.

Topic		Replies	Views
Event_Staged in Data Modeling for new Business requirement SQL Runner	2	1100	December 29, 2021
Snowplow data modeling issues when updating schema SQL Runner	1	991	March 22, 2022
Can't query events table	3	918	March 9, 2022
Making SQL data models incremental to improve performance [tutorial] Redshift	11	9382	October 11, 2017
Data model compatibility for pipelines without web page context enabled For data modelers & consumers	0	1246	January 8, 2017

Data Modeling - events_staged table is empty

The easiest unblock

Related topics