Snowplow monitoring tools

Hi Snowplowers,

I am using GCP to deploy the Snowplow data pipeline. I am currently using the monitor.metrics.stdout in enrich configuration file to monitor the good and bad events number, but I am not sure if there is any other way I can get these logs information out to use. And I am also wondering whether there is any similar function in the collector and BigQuery loader, mutator configurations. Or other tools you would suggest using for monitoring?


Hi @phxtorise, it’s great that you’re using the metrics. We’re excited about this new feature in the Snowplow pipeline, because of the opportunities it brings for observing pipeline health.

As well as stdout metrics, enrich also emits metrics in statsd format. Statsd is a popular 3rd party open source tool that listens for events sent over UDP and forwards them to a pluggable backend. We like the statsd format because the pluggable nature means we hopefully enable more users to explore the metrics in many different ways. Also, it’s not free, but this metrics format is also understood by Datadog.

There is documentation about using enrich and statsd over here, and see the monitoring.metrics.statsd options in the configuration reference.

And yes we do support metrics in the collector and BigQuery loader! For BigQuery loader, here are the relevant fields of the configuration reference. We don’t seem to have any documentation for the collector statsd metrics (yet - it’s very new) but here is an example configuration in the github repo.

Not for you, but in case aws pipeline users are reading this… we also have documentation for statsd metrics in the S3 loader and in the RDB loader.


I am currently investigating monitoring approaches and never worked with statsd. We are using Datadog so it would be kind of obvious to use statsd.

Question: I am having a hard time to understand the benefits of statsd vs. GCP cloud monitoring policies (which are in place for BDP customers). Are the metrics basically the same but statsd is just the standard to consume the data in 3rd Party tools like datadog? If so, it it’s probably not worth the effort to develop a datadog consumer if GCP cloud monitoring offers that out of the box?

Kind regards.