Monitoring of snowplow


I am currently setting snowplow up on AWS, and it is really cool.
In the scala collector and stream enricher I see some monitoring configuration.
I don’t see a lot of documentation on this.

How are these monitoring handles suppose to be used?

Hey @khebbie! Good question - this is just regular Snowplow tracking embedded in the apps for monitoring purposes.

You can just point the monitoring section to another Snowplow instance, or indeed a Snowplow Mini:

1 Like


I am trying to set up some monitoring for my Snowplow instance, but can’t seem to find the handles mentioned by @khebbie inside of the Scala Collector. We used this config sample as our basis, but there isn’t anything in there other than Prometheus metic setup there. I would like to point our Collector to our Snowplow-Mini instance to collect it’s health data.

Thanks for any help from anyone.

Hey @abrhim - the original thread relates to the Snowplow AWS batch pipeline. I’ve updated the thread topic to make that clear.

Okay, cool. So then for the collector we are working with is there an alternative to using prometheus, say, a snowplow-mini?


If i am not wrong , do you want to monitor stream data ?? , if yes then …

you can use Elasticsearch for monitoring.
There is one add-on called “snowplow-elasticsearch-loader”
with this you can monitoring your stream data with Elasticsearch.

Your data flow be like

Collector --> kinesis stream --> snowplow-elasticsearch-loader --> Elasticsearch

No, i want to have a system to monitor the health of my already existing Snowplow pipeline. This article gives an outline what metrics are available to be sent out of the Snowplow pipeline for another application to ingest. I have provisioned a Snowplow-mini and successfully connected it to our Enricher, but I am struggling to find anyway to have our Collector to push any metrics to it. I have tried to simply put this in the configure.hocon in our Scala Collector but it doesn’t seem to be sending out any metrics. Once i figure this out I would love to write up some documentation for others as well. I will make a new thread about this to avoid irrelevancy.

Currently the only metrics exposed by the collector are via a Prometheus endpoint. There’s no push metric functionality within the collectors at the moment.

@mike, thanks for the reply. Do you know why that is? It seems odd that all other components of Snowplow are enabled to send out metrics to another Snowplow instance but not the collector.

Would you or anyone else know of a way for the collector send metrics out to another Snowplow instance?

The best way at the moment would be to make a pull request as a RFC to the Scala stream collector exposing the sorts of metrics / dimensions that you are after.

1 Like

There are a couple of projects on our roadmap related to monitoring and metrics. We’re updating the format of our bad rows to give better visibility of where and why the data couldn’t be processed (A new bad row format). We also have plans to emit volume and latency metrics directly from the components, rather than observing this from the streams. Hopeful that both will be released in the next quarter. If there are other metrics that would be of use to emit from the components do let us know!

1 Like