I recently started a repository with the goal of automating the creation of a fully-functional Snowplow Analytics (Scala Streaming) stack in AWS. It’s still in development:
The idea is to make it easy for someone to spin up a basic system, and then tweak it to whatever use case they need, as their analytics needs grow.
I was wondering if any similar projects have already been started by Snowplow. Or if any were planned for the future? Is there anything you believe to be an “optimal setup” that would be a good starting point for most use-cases to pivot from?
We of course have plenty of internal automation around deploying and monitoring Snowplow Insights, our commercial product, but we don’t have any plans to develop and maintain open-source automation scripts for snowplow/snowplow or our other projects.
If anyone thinks that something is missing or that the configurations could be improved (for people to branch off of), please let me know in this thread or make a PR or issue!
If it’s not appropriate to post updates like this through the forum, let me know!
I’m going to wait a little bit, and build an app with it, to confirm that it’s in working condition and does what I want it to do, before writing the final tutorial!
Elasticsearch was chosen instead of what I’ve used in production (Redshift) because I think Elasticsearch is more affordable for smaller scales of data. That being said, I need to learn how to actually use it to know if it does what I think I’m looking for.
Let me know if you see any weirdness with this configuration or can think of a better way! Hope the Terraform configuration helps some folks to set up their initial Snowplow stack!
Nice work! I’m not sure if the Elasticsearch loader supports ES6 yet but you can probably upgrade the reference from 1.5 to 5.5 (this will give you a few extra features as well as the latest version of Kibana).
Nice!! Thanks for the tip! Changed it from 1.5 -> 5.5!
By watching in Kibana, I have confirmed that this pipeline is working, entirely created through these Terraform configurations. Kibana representing events immediately as they happen: