Using a manifest to keep track of what data has been modeled (incremental modeling)

yali · April 18, 2016, 9:54am

A common pattern amongst our users is to use this as a manifest to record what data has been modeled, and what hasn’t. So, at the start of the data modeling process, you would run a query like this to identify all the runs that have occurred in the last week, and identify any runs that have not yet been processed:

INSERT INTO scratch.etl_tstamps (

  WITH recent_etl_tstamps AS ( -- return all recent ETL timestamps

    SELECT DISTINCT etl_tstamp FROM atomic.events
    WHERE collector_tstamp > DATEADD(week, -1, CURRENT_DATE) -- restrict table scan to the last week (collector_tstamp is SORTKEY) for performance reasons
    ORDER BY 1

  )

  -- store all ETL timestamps that are not in the manifest (i.e. that have NOT been processed)

  SELECT DISTINCT etl_tstamp FROM recent_etl_tstamps
  WHERE etl_tstamp NOT IN (SELECT etl_tstamp FROM datamodeling.etl_tstamps ORDER BY 1)
  ORDER BY 1

);

The values in scratch.etl_tstamps are then used to subset the data that subsequent aggregations are computed on. (So that only the new data is aggregated, and then these aggregates are combined with aggregates that were computed on previous runs.)

Once the data modeling step has been completed successfully, the etl_tstamp values that have now been processed are added to the manifest as one of the final steps in the data model:

ALTER TABLE datamodeling.etl_tstamps APPEND FROM scratch.etl_tstamps;

Topic		Replies	Views
Best approach to incremental data modeling? (etl_tstamp vs. staging) Redshift	1	3204	October 6, 2016
Tracking how SQL data models perform [tutorial] Redshift	0	2370	June 8, 2016
Session stitching, session_end_tstamp For data modelers & consumers	1	1295	July 21, 2016
Making SQL data models incremental to improve performance [tutorial] Redshift	11	9382	October 11, 2017
Full-Refresh Protection For data modelers & consumers	2	961	March 14, 2023

Using a manifest to keep track of what data has been modeled (incremental modeling)

Related topics