Is there any way to set the GCP stream loader up so that you can have the pubsub topics in one project and the BigQuery table in another? At the moment, it looks like you need to use the same project ID for both in the config. Is there a way around that?
Hey @iain, I don’t think it is possible indeed - we were wondering if this is something users would be interested though. If there are some standard tools to forward data between pubsub topics in different projects then you can have a hacky setup with enriching data in project A, without BQ - then all data will be immediately sent to failed inserts topic. If you manage to forward this topic to another project - you can just insert it with BQ repeater, I think Loader’s and Repeater’s performance is comparable.
Sorry to revive this topic. In fact, we also have a situation where exactly this scenario would be helpful: We have differently configured pipelines in several projects. We would therefore like to define a common dataset and table in a central analytics project as a sink for the different BigQuery StreamLoader apps.
Is this now possible with the BigQuery StreamLoader? Otherwise, do you have any other idea how to implement this requirement (loading the enriched events into a BigQuery table of another project)?
It might be possible but I suspect you going to have deal with edge cases that are going to be a pain, particularly around things like mutation where you are potentially going to have multiple different projects mutating the same table.
I think this depends on what you are using the table for e.g., debugging, real time analytics etc but I’d consider a view in that project that unifies the tables from the separate projects or if you don’t need real time data a table that is incrementally rebuilt from the source tables using load_tstamp
.
The view variant turned out to be the solution for my requirements. Thanks for the tip!