Api enrichment from BQ vs batch update later

Jitender_Attri · September 22, 2021, 5:44am

can you suggest which approach is better getting enrich data in real time from BQ by API enrichment vs update directly once in a day later

PaulBoocock · September 22, 2021, 7:52am

I think this really depends on your use cases.

Using the API/SQL Enrichment will mean your context/entities are present in your atomic events table as soon as the row loads into BigQuery. This can be very handy if that data needs to be consumed from atomic events in near real time, or you want to consume that data from a PubSub stream before it even hits BigQuery. However, you’re probably going to increase your BigQuery invoice slightly by running queries against it and you’re going to introduce some minor lag into your pipeline by calling another API (although not much, but if you have other near real time usecases, an extra second or two might be important). Additionally, you’re are creating another dependency in your pipeline by calling an API, if this API goes down then you’ll likely start seeing failed events.

Most use cases don’t require these context/entities to be on the atomic events row in near real time, in fact I’d probably say 95% don’t. So I’d always start with doing the joining in a data model when you create some derived tables (like how the Snowplow Web and Mobile models work), perhaps once an hour or once a day depending on how important it is to be able to query the data near to the time the event occured. If this turns out to not be sufficient then it’s time to consider the API/SQL enrichments.

Topic		Replies	Views
How can we do SQL enrichment with BQ Enrichment	5	862	September 16, 2021
Example: Running Snowplow real-time pipeline on GCP with Kafka and Kubernetes Kafka real-time pipeline	6	3086	June 1, 2017
Snowplow Events from Google Bucket to BigQuery Storage targets	1	1110	July 29, 2020
Batch v real time enrichment Enrichment	5	2233	February 14, 2019
Data modelling for real time kafka pipeline Enrichment	2	1068	September 2, 2020

Api enrichment from BQ vs batch update later

Related topics