Collector RabbitMQ and Enrich RabbitMQ released

The only thing that jumps out to me is the output filepath - I might be wrong but I think it expects absolute path (tbh with this project it’s something we chip away at for internal testing purposes rather than something we treat as a ‘product’ if that makes sense. So we haven’t had much of a focus on making it more user friendly).

I think try with either file:"$(pwd)/snowplow-enrich-event-generator/kafka/my-events (assumes bash, tested on mac), or just the absolute path to your dir.

If I’m right I think it would’ve created a dir in your root folder, and the data will be in there. :slight_smile:

(Edit: ./ might also work, I can’t remember why I used $(pwd) in a script from months ago tbh, but I did that for some reason so that’s my guess for this case too :smiley: )

A few tweaks and we’re up and running. You’re a star - appreciate the help and quick response.

1 Like

NP at all. Keep us posted, keen to hear how you get on

Hi @stanch & @Colm -

Hope you’re both well - just wondering if there has been any progress on this PR?

FYI: Within Azure Stream Analytics, you can now write an Event Hub / Kafka stream directly to Azure Data Lake Storage (Gen 2) in Delta Parquet (and Microsoft have just changed their Stream Analytics pricing model, so it is now significantly cheaper :slight_smile: ). It is currently in preview, but all being well it won’t be long until it’s GA.

Cheers,
Steve

Hi @stanch & @Colm -

Just wondering if there was an update on the PR above? Champing at the bit, as you may have gathered :smiley:

Cheers,
Steve

Hi @stanch -

Hope all is good with you?

Wondering if there is any news / anything I can do to help on this one?

Look forward to hearing from you.

Cheers,
Steve

Hi @steve.gingell,

The Kafka source PR is from an external contributor, so we can’t really provide a timeline (unless you’d like to step in and help get it over the line, of course :slight_smile:).

Regarding the overall Azure support, stay tuned for announcements next week! It looks like we’ll be able to release the new lake loading component this summer as planned.

Hi @stanch -

Good to hear from you and I look forward to hearing next weeks’ announcements :slight_smile: Very exciting.

If I need to return to the PR option, not sure I have the technical expertise, but if it’s a case of trying to coordinate, then happy to help.

Thanks again for getting back to me and roll on next week :slight_smile:

Cheers,
Steve

Hi @stanch -

Just saw the announcement and read your article: Announcing open source Azure support | Snowplow - very cool!!

Presumably, in my set-up, I can use just the Transformer Kafka from the RDB Loader, not worry about the Loader Snowflake implementation, and then consume the Transformer Kafka output myself in a custom application - is that right?

Thanks again for all of your help, Nick.

Cheers,
Steve

I was hoping that the Transformer Kafka output was another Kafka stream, but it looks like it’s blob storage - is that right? To switch it from blob storage to Kafka stream would, presumably, require dev work my side?

Correct. It’s an intermediate blob storage location that’s used as a staging area by the loader. If you are already reading from the enriched Kafka/EventHubs stream (which the Enrich application writes to), then you will not benefit from Transformer/Loader.

You might, however, benefit from our Terraform modules to run the rest of the pipeline. And hopefully, in a few weeks we should have a new dedicated lake loader as well.

Thanks for the quick response, @stanch, and congrats again on this Azure integration milestone :slight_smile:

The reason I mention the Transformer is that I was looking to apply some transformations to the enriched data, so thought this would be needed …

When you mention “dedicated lake loader”, does that mean the intermediate blob storage location will no longer be needed and the data will be loaded directly into the lake? In a different format, for example delta parquet…? :slight_smile:

You could say “Transformer” is an unfortunate name, but then again it used to be called “Shredder” :man_shrugging: It does some pre-canned transformation that’s needed for the Loader, but it’s not meant be to be used standalone.

For transformations, I think what you are looking for is Snowbridge with Kafka input (I know, I know), or Benthos.

Maybe :slight_smile:

Thanks, Nick.

So to progress the Snowbridge with Kafka input, I need to reach out to the contributor who raised the PR and take it from there, right?

@steve.gingell I’ve implemented the tests that we needed, it’s in PR review now. There’s some cleanup needed but once I manage to find time to get that done and get a final review it’ll be released.

In the meantime I’ve released a pre-release asset - you can use version 2.2.0-rc1 to experiment with. Here’s a config example for the source.

1 Like

@Colm - you’re an absolute star! Thanks for this; really appreciate it.

I’ll get cracking and let you know how I get on :smiley:

Thanks again,
Steve

@steve.gingell it’s now released. I recommend using the prod asset over the rc I pointed you to as it contains vulnerability fixes. Should be no difference apart from that though. :slight_smile:

Hi @steve.gingell, just to follow up on this in case you haven’t seen the announcement: https://snowplow.io/blog/announcing-snowplow-lake-loader/.

Thanks, @stanch - appreciate the update and apologies for the delayed response.

Things are progressing my side, but, for whatever reason, I am unable to reference environment variables locally in my .env file within my .hocon config file for the kafka (event hubs) collector - it only appears to work when hard-coded, i.e.

Works:

“sasl.jaas.config” = “org.apache.kafka.common.security.plain.PlainLoginModule required username=\”$ConnectionString\" password=\“Endpoint=sb://client-prod-obs-ehns-snowplow.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=XXXXXX\”;"

Doesn’t work:

“sasl.jaas.config” = “org.apache.kafka.common.security.plain.PlainLoginModule required username=\”$ConnectionString\" password=\“${PASSWORD}\”;"

Also, wondering if plain text is the most secure way to go?

Hi @Colm - sorry to bother you, but any ideas why I’m unable to reference environment variables (stored locally in my .env file) within my .hocon config file for the kafka (event hubs) collector?

Any help will be gratefully received, as I do not want to be hard-coding sensitive data in my .hocon file… :wink: