I am using GCP to deploy the Snowplow data pipeline. I am currently using the monitor.metrics.stdout in enrich configuration file to monitor the good and bad events number, but I am not sure if there is any other way I can get these logs information out to use. And I am also wondering whether there is any similar function in the collector and BigQuery loader, mutator configurations. Or other tools you would suggest using for monitoring?
Hi @phxtorise, it’s great that you’re using the metrics. We’re excited about this new feature in the Snowplow pipeline, because of the opportunities it brings for observing pipeline health.
As well as stdout metrics, enrich also emits metrics in statsd format. Statsd is a popular 3rd party open source tool that listens for events sent over UDP and forwards them to a pluggable backend. We like the statsd format because the pluggable nature means we hopefully enable more users to explore the metrics in many different ways. Also, it’s not free, but this metrics format is also understood by Datadog.
There is documentation about using enrich and statsd over here, and see the
monitoring.metrics.statsd options in the configuration reference.
And yes we do support metrics in the collector and BigQuery loader! For BigQuery loader, here are the relevant fields of the configuration reference. We don’t seem to have any documentation for the collector statsd metrics (yet - it’s very new) but here is an example configuration in the github repo.
Not for you, but in case aws pipeline users are reading this… we also have documentation for statsd metrics in the S3 loader and in the RDB loader.