Full-Refresh Protection

Stella_Oppermann · March 10, 2023, 12:14pm

Hello ,

we were wondering about the full refresh logic in the model and have a question about it. Why are only the manifest tables protected ? If you are doing a full refresh and the data in the page view and session models are lost, then the data in the manifest tables are not much use and you would want to reprocess anyway ? Is there anything against protecting your own tables with the macro ?

Best wishes,
Stella

Ryan · March 10, 2023, 12:27pm

Hey @Stella_Oppermann, the logic for this was introduced a little before my time but I can give an idea about it. One reason for this protection is that the manifest table is used by all (derived) tables, which means even if you did a dbt run with a full refresh but selected to just your page views table for a full refresh, it would refresh the whole manifest table which isn’t great!

You’re right if you just do a full-package full-refresh that there’s not a load of point. In general we combine it with a select flag and setting the models_to_remove variable to refresh a specific derived table from the start date (you can see an example here)

You’re welcome to use the macro to protect your own tables against refresh, just be aware that it only works in your prod target, you can full-refresh in your dev target and everything will be refreshed including the manifest tables. It won’t be applied to the config of the package derived tables so you’ll need to add that yourself as well.

Hope that helps!

Stella_Oppermann · March 14, 2023, 9:11am

Thank you very much Ryan ! That was super helpful for me

Topic		Replies	Views
Data Modeling - events_staged table is empty For engineers	3	605	November 12, 2021
Data modeling in real time	8	1454	September 4, 2023
Using a manifest to keep track of what data has been modeled (incremental modeling) For data modelers & consumers	0	1338	April 18, 2016
Snowplow_dbt_web: Backfilling sessions_lifecycle_manifest size For data modelers & consumers	8	853	July 7, 2023
Dbt snowplow_web starts always from `start_date` For data modelers & consumers	3	483	November 30, 2023

Full-Refresh Protection

Related topics