The only thing that jumps out to me is the output filepath - I might be wrong but I think it expects absolute path (tbh with this project it’s something we chip away at for internal testing purposes rather than something we treat as a ‘product’ if that makes sense. So we haven’t had much of a focus on making it more user friendly).
I think try with either file:"$(pwd)/snowplow-enrich-event-generator/kafka/my-events (assumes bash, tested on mac), or just the absolute path to your dir.
If I’m right I think it would’ve created a dir in your root folder, and the data will be in there.
(Edit: ./ might also work, I can’t remember why I used $(pwd) in a script from months ago tbh, but I did that for some reason so that’s my guess for this case too )
Hope you’re both well - just wondering if there has been any progress on this PR?
FYI: Within Azure Stream Analytics, you can now write an Event Hub / Kafka stream directly to Azure Data Lake Storage (Gen 2) in Delta Parquet (and Microsoft have just changed their Stream Analytics pricing model, so it is now significantly cheaper ). It is currently in preview, but all being well it won’t be long until it’s GA.
The Kafka source PR is from an external contributor, so we can’t really provide a timeline (unless you’d like to step in and help get it over the line, of course ).
Regarding the overall Azure support, stay tuned for announcements next week! It looks like we’ll be able to release the new lake loading component this summer as planned.
Presumably, in my set-up, I can use just the Transformer Kafka from the RDB Loader, not worry about the Loader Snowflake implementation, and then consume the Transformer Kafka output myself in a custom application - is that right?
I was hoping that the Transformer Kafka output was another Kafka stream, but it looks like it’s blob storage - is that right? To switch it from blob storage to Kafka stream would, presumably, require dev work my side?
Correct. It’s an intermediate blob storage location that’s used as a staging area by the loader. If you are already reading from the enriched Kafka/EventHubs stream (which the Enrich application writes to), then you will not benefit from Transformer/Loader.
You might, however, benefit from our Terraform modules to run the rest of the pipeline. And hopefully, in a few weeks we should have a new dedicated lake loader as well.
Thanks for the quick response, @stanch, and congrats again on this Azure integration milestone
The reason I mention the Transformer is that I was looking to apply some transformations to the enriched data, so thought this would be needed …
When you mention “dedicated lake loader”, does that mean the intermediate blob storage location will no longer be needed and the data will be loaded directly into the lake? In a different format, for example delta parquet…?
You could say “Transformer” is an unfortunate name, but then again it used to be called “Shredder” It does some pre-canned transformation that’s needed for the Loader, but it’s not meant be to be used standalone.
For transformations, I think what you are looking for is Snowbridge with Kafka input (I know, I know), or Benthos.
@steve.gingell I’ve implemented the tests that we needed, it’s in PR review now. There’s some cleanup needed but once I manage to find time to get that done and get a final review it’ll be released.
@steve.gingellit’s now released. I recommend using the prod asset over the rc I pointed you to as it contains vulnerability fixes. Should be no difference apart from that though.
Thanks, @stanch - appreciate the update and apologies for the delayed response.
Things are progressing my side, but, for whatever reason, I am unable to reference environment variables locally in my .env file within my .hocon config file for the kafka (event hubs) collector - it only appears to work when hard-coded, i.e.
Hi @Colm - sorry to bother you, but any ideas why I’m unable to reference environment variables (stored locally in my .env file) within my .hocon config file for the kafka (event hubs) collector?
Any help will be gratefully received, as I do not want to be hard-coding sensitive data in my .hocon file…