I’ll do a bit more research around dynamic tables next week, but my initial understanding of dynamic tables and that sort of near-real-time processing suggests that in general our packages are not well suited to this and to adapt them to be so would be very difficult.
The core reason for this is our sessionisation logic, while most individual derived tables use the standard incremental
materialization, the reason we have snowplow_web_base_events_this_run
is because this contains not just new events but also all events in the sessions that have new events (with some caveats). This core logic is why we strongly recommend people use our packages for derived processing at a session level. But at an event level, a basic incremental model would work fine. Because of this and the manifest table approach, I’m not convinced that it would easily work with dynamic tables (although I may be wrong after some more time spent understanding them), and also is probably still inefficient for a real-time usecase.
It depends somewhat on what your exact use-case is for needing near real-time data modelling, and I will just put some personal bias out here and say nearly every project I’ve seen where someone has wanted near real time modelled data they actually were better off using either the raw events stream or gained no benefit from the low latency, but if you need it you may be better off using a much simpler processing for a small volume of recent data, and keep the higher latency processing from the package for historic information.
As an example, say you do need some sessionisation, and you are trying to identify sessions of users with abandoned carts (a classic example, although a case where sub minute latency is not really needed), instead of trying to do lookups and manage manifests, you could just process all events in the last 2 hours, find sessions with a valid email and items in their cart, and no events for the last 5 minutes - you can then assume these users have abandoned their cart and trigger something like an email to them. It won’t be perfect but you’ll catch the vast majority of cases. The only additional thing you’d have to manage on top is to exclude sessions you’ve already contacted, but that would depending again on your exact use case and you’d only have to store those within that same 2 hour range (and your email contact tool may even have rules to avoid sending the same email within a certain time so you might not need to track it at all, in this specific example).
It’s probably a little more complex than you were hoping for, but I think while it may be possible we support dynamic tables in the future (as dbt snowflake is looking to add support for them in general), I don’t believe that the core logic of our packages is well suited for low-latency processing. Also, dbt in general is a batch tool, again you may be better looking at using the raw events if possible via something like Snowbridge and a spark-based transformation tool.
If you can share more details about your specific use case we might be able to offer some additional suggestions?