Regardless of loading to a database like BigQuery, you can set up a separate consumer on the enriched stream to do real-time processing.
So even if I go ahead with native loader, I would still need to process enriched events directly from the pub/sub to maintain the data freshness.
So If I have a self-describing events schems and derived pojos, would I be able to deserialise it into Dataset? I am just asking based on your experience. Hope you don’t mind.
If by this, you mean you’d like to deliver the data to real-time systems - not do something like loading to a database - then this is likley fairly straightforward (depending on use case). The data will be enriched TSV format - which is a TSV with some JSON/JSON array columns. You can use the analytics SDKs to transform that into a JSON.
Loading the data to an in-memory spark Dataframe or similar is not difficult - but you need to be aware that the schemas for the self-describing JSON can change, and account for that in how you architect your job.
If your task is to load this data into a database, it is then much more complex, because the schema of the table you’re loading into is fixed, but the schema of the data evolves. So for getting the data into the database, we recommend using one of our loaders. If your use case has similar constraints or difficulties with evolving schemas, then the task will of course be much more challenging.
If you do need a loader, one option is to set up a kafka pipeline, and use snowbridge to stream that data into a separate pubsub topic, which has a bigquery loader set up on it. This would duplicate the cost of the enriched stream, but it would give you freedom to have your Kafka infrastructure power real-time use cases, while also loading to BQ without having to fork anything.
There’s absolutely no requirement to set up a loader at all though. We typically recommend doing so as the data tends to be very valuable for analytics use cases, but if your primary goal is to power real-time systems, you can do so without a database at all.