We’re excited to announce that we’ve added RabbitMQ support to the collector and to enrich.
We want to experiment running a cloud agnostic pipeline that we could deploy on any Kubernetes cluster (leveraging Helm Charts). This would mean that the same pipeline could be running on AWS, on GCP, on Azure, on-premise or even locally. We’d love to hear what you think of this idea!
We already had Kafka assets that could serve this purpose but it seems easier for us to run and maintain a RabbitMQ cluster in production than a Kafka one.
It’s important to keep in mind that these 2 new assets are experimental, which means that we might decide to drop them in the future depending on the results of our experiments.
If anyone is interested in testing these assets, we’ll gladly receive feedback. Issues can be opened on collector and enrich repos.
How to run?
Instructions to setup and configure Collector RabbitMQ collector can be found on our docs website.
Instructions to setup and configure Enrich RabbitMQ collector can be found on our docs website.
We are still testing a few different messaging frameworks, including RabbitMQ, to pick the one we could use for all our AWS, GCP and (in the future) Azure deployments. Unfortunately, at the moment it looks like we are unlikely to go with RabbitMQ because of its scalability model and maintenance complexity. This means we are likely to deprecate these modules in the near future
Are you set on RabbitMQ, or would you be ok with a different framework? Using Snowbridge, you can stream enriched Snowplow data from Kinesis or Pub/Sub into Kafka, and that integration is not going away Actually, we have a Product Office Hours session about that tomorrow in case you’d like to attend.
Dare I ask, but is RabbitMQ definitely going to be (already is) deprecated? I came across it recently and it would certainly be of value. I am, however, having a few config issues, so was not sure if I could trouble someone for some help, if it is to be no more…
Implement data collection and enrichment on Azure.
So far, RabbitMQ and the Collector are running as containers in Azure Container Apps, with each message triggering an Azure Function (making use of the RabbitMQ binding).
Unfortunately, although the Collector is picking-up page views, for example, and states that it is sending a Thrift record to the RabbitMQ server, no such record is created and the Azure Function is not being triggered.
It feels like I’m agonisingly close to getting it working; gut instinct says it could just be a config issue with the collector, but not sure…
With the Enrich Event Hub producer working, do you know of a way of deserialising the payload to one of the following Azure natively supported formats: Avro, JSON, CSV (ideally JSON as this opens up A LOT more doors).
Would Snowbridge be an option?
Thanks again for your help and hopefully I won’t need to bother you after I have this last piece of the puzzle
You could also use Benthos with a custom transformation (using jq or bloblang) that replicates what the Snowbridge transformation does (convert CSV to JSON and convert the stringified JSON columns to JSON).
Or… If you don’t mind waiting a few months, we are looking into getting our regular warehouse loader working on Azure and/or potentially loading into OneLake in delta parquet The advantage of using whatever loader we come up with here is that it will properly deal with schema evolution. I.e. if you change the schema of your data, the loader would automatically adjust the warehouse column or create a new one (we have this functionality today for AWS and GCP).
You’ve probably already guessed my next question, do you have an idea of timeframes? Ideally, I need to be up and running by the end of the month. Do you think that the Kafka source support PR will be closed by then?
And with regards the warehouse loader working on Azure, when realistically would “a few months” be?
In the meantime, is there an easy way to pull the default event schemata, so that I can generate some dummy data?