We are currently setting up a Snowplow pipeline to ingest events from multiple sources, and we have different SQL enrichments to do based on the type (schema) of the event.
- Is it advisable to to setup multiple pipelines instead for different types of events for scalability/efficiency of enrichment (since some enrichment is optional for some events)?
Hi @yuaho ,
This is feasible but that doesn’t sound like a viable solution as it would be more expensive to run and more work to operate. It’s better to think about ways to process all the different sources within a same pipeline.
That’s correct, it’s not possible to run several SQL enrichments yet.
It depends if you want the value attached to the event to be the one that is in the database at the time the event is getting enriched or at the time the model is being run.
I’ll just add some context from working with other users before on similar stuff - I think it’s a good idea to evaluate whether you need this to happen in the pipeline itself.
In my experience, quite often the use case behind a user aiming to use the SQL enrichment is just as well served in data modeling after loading to the database.
If you DO have a requirement for this enrichment in real-time, and just one SQL enrichment won’t cut it, then it may be worth considering if hosting an API service would fit the bill instead.
Hope that helps.