Collector RabbitMQ and Enrich RabbitMQ released

This will depend on how other things in our roadmap pan out, but I don’t think it will happen this month. There is a good chance it happens during the summer though :slight_smile:

This is awesome, thanks @Colm - just what I’m looking for.

I’ve just tested it and although the following command appears to run without error, no events are being saved to file - any ideas? Appreciate any assistance you can offer.

sudo ./snowplow-event-generator --config ./snowplow-enrich-event-generator/config.hocon --output file:/snowplow-enrich-event-generator/kafka/my-events

Here is the contents of my config file:

“seed”: 1
“payloadsTotal”: 1000
“withRaw”: true
“withEnrichedTsv”: true
“withEnrichedJson”: true
“compress”: false
“payloadsPerFile”: 1000
“eventPerPayloadMin”: 1
“eventPerPayloadMax”: 1
“duplicates”: {
“natProb”: 0.0
“synProb”: 0.0
“natTotal”: 1
“synTotal”: 1
“timestamps”: {
“type”: “Fixed”
“at”: “2022-02-01T01:01:01z”

The only thing that jumps out to me is the output filepath - I might be wrong but I think it expects absolute path (tbh with this project it’s something we chip away at for internal testing purposes rather than something we treat as a ‘product’ if that makes sense. So we haven’t had much of a focus on making it more user friendly).

I think try with either file:"$(pwd)/snowplow-enrich-event-generator/kafka/my-events (assumes bash, tested on mac), or just the absolute path to your dir.

If I’m right I think it would’ve created a dir in your root folder, and the data will be in there. :slight_smile:

(Edit: ./ might also work, I can’t remember why I used $(pwd) in a script from months ago tbh, but I did that for some reason so that’s my guess for this case too :smiley: )

A few tweaks and we’re up and running. You’re a star - appreciate the help and quick response.

NP at all. Keep us posted, keen to hear how you get on

Hi @stanch & @Colm -

Hope you’re both well - just wondering if there has been any progress on this PR?

FYI: Within Azure Stream Analytics, you can now write an Event Hub / Kafka stream directly to Azure Data Lake Storage (Gen 2) in Delta Parquet (and Microsoft have just changed their Stream Analytics pricing model, so it is now significantly cheaper :slight_smile: ). It is currently in preview, but all being well it won’t be long until it’s GA.


Hi @stanch & @Colm -

Just wondering if there was an update on the PR above? Champing at the bit, as you may have gathered :smiley:


Hi @stanch -

Hope all is good with you?

Wondering if there is any news / anything I can do to help on this one?

Look forward to hearing from you.


Hi @steve.gingell,

The Kafka source PR is from an external contributor, so we can’t really provide a timeline (unless you’d like to step in and help get it over the line, of course :slight_smile:).

Regarding the overall Azure support, stay tuned for announcements next week! It looks like we’ll be able to release the new lake loading component this summer as planned.

Hi @stanch -

Good to hear from you and I look forward to hearing next weeks’ announcements :slight_smile: Very exciting.

If I need to return to the PR option, not sure I have the technical expertise, but if it’s a case of trying to coordinate, then happy to help.

Thanks again for getting back to me and roll on next week :slight_smile:


Hi @stanch -

Just saw the announcement and read your article: Announcing open source Azure support | Snowplow - very cool!!

Presumably, in my set-up, I can use just the Transformer Kafka from the RDB Loader, not worry about the Loader Snowflake implementation, and then consume the Transformer Kafka output myself in a custom application - is that right?

Thanks again for all of your help, Nick.


I was hoping that the Transformer Kafka output was another Kafka stream, but it looks like it’s blob storage - is that right? To switch it from blob storage to Kafka stream would, presumably, require dev work my side?

Correct. It’s an intermediate blob storage location that’s used as a staging area by the loader. If you are already reading from the enriched Kafka/EventHubs stream (which the Enrich application writes to), then you will not benefit from Transformer/Loader.

You might, however, benefit from our Terraform modules to run the rest of the pipeline. And hopefully, in a few weeks we should have a new dedicated lake loader as well.

Thanks for the quick response, @stanch, and congrats again on this Azure integration milestone :slight_smile:

The reason I mention the Transformer is that I was looking to apply some transformations to the enriched data, so thought this would be needed …

When you mention “dedicated lake loader”, does that mean the intermediate blob storage location will no longer be needed and the data will be loaded directly into the lake? In a different format, for example delta parquet…? :slight_smile:

You could say “Transformer” is an unfortunate name, but then again it used to be called “Shredder” :man_shrugging: It does some pre-canned transformation that’s needed for the Loader, but it’s not meant be to be used standalone.

For transformations, I think what you are looking for is Snowbridge with Kafka input (I know, I know), or Benthos.

Maybe :slight_smile:

Thanks, Nick.

So to progress the Snowbridge with Kafka input, I need to reach out to the contributor who raised the PR and take it from there, right?

@steve.gingell I’ve implemented the tests that we needed, it’s in PR review now. There’s some cleanup needed but once I manage to find time to get that done and get a final review it’ll be released.

In the meantime I’ve released a pre-release asset - you can use version 2.2.0-rc1 to experiment with. Here’s a config example for the source.

@Colm - you’re an absolute star! Thanks for this; really appreciate it.

I’ll get cracking and let you know how I get on :smiley:

Thanks again,

@steve.gingell it’s now released. I recommend using the prod asset over the rc I pointed you to as it contains vulnerability fixes. Should be no difference apart from that though. :slight_smile:

