OpenTelemetry support

I have searched snowplow Github repos regarding support of OpenTelemetry ingestion of data and coulnd’t find even one mention of OpenTelemetry.

Why is there no support for ingesting OpenTelemetry data (especially traces) ?

1 Like

Hi @Ovidiu_Buligan ,

I’m a little confused by how the question was presented - why would it be surprising for us not to mention a particular Telemetry provider in our repos or docs? Why would it be a matter of justifying why we don’t support it, rather than a matter of making the case for why we should support it?

Snowplow is primarily used for generating data about user behaviours in digital products - but it can be used for a lot of other use cases. If your goal is to ingest data from a third party telemetry platform, then you can do so via custom schemas and webhooks - the same applies for any third party data source.

Hope that’s helpful.

3 Likes

Can snowplow add support for OpenTelemetry protocol (OpenTelemetry Protocol | OpenTelemetry) ?
OpenTelemetry makes it simpler to support more telemetry platforms at the same time . OpenTelemetry has a very popular component that can export in different telemetry formats https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter .
A lot of filtering/sampling/configuration can be done at this component in a unified way. The collector can be configured to export to multiple telemetry providers at the same time to get the best of multiple telemetry platforms . For example we use app insights (for user insights) and grafana tempo (for better trace views). We would like to explore adding snowplow as a sink .

From a quick read of the OpenTelemetry docs, it seems that it can send the data via HTTP POST request, and it has a standardised structure for what that data looks like.

If I’m correct about that, then I think we support this already - you can send that data through your Snowplow pipeline as follows:

  • define a JSON schema for the data you want to send through the pipeline
  • Configure OpenTelemetry to send that data via the Iglu Webhook, referencing the schema you have created (which can be done via the querystring if you can’t specify the content of the request).

I hope that helps!

2 Likes

@Ovidiu_Buligan if you do end up developing schemas that match the OpenTelemetry spec and feel like contributing them back would love to help get them into Iglu Central → GitHub - snowplow/iglu-central: Contains all JSON Schemas, Avros and Thrifts for Iglu Central

For anyone wanting to implement this maybe you can get ispired by looking at the default schema ClickHouse uses for OTEL .
Table for traces :

CREATE TABLE default.otel_traces
(
    `Timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
    `TraceId` String CODEC(ZSTD(1)),
    `SpanId` String CODEC(ZSTD(1)),
    `ParentSpanId` String CODEC(ZSTD(1)),
    `TraceState` String CODEC(ZSTD(1)),
    `SpanName` LowCardinality(String) CODEC(ZSTD(1)),
    `SpanKind` LowCardinality(String) CODEC(ZSTD(1)),
    `ServiceName` LowCardinality(String) CODEC(ZSTD(1)),
    `ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `ScopeName` String CODEC(ZSTD(1)),
    `ScopeVersion` String CODEC(ZSTD(1)),
    `SpanAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `Duration` Int64 CODEC(ZSTD(1)),
    `StatusCode` LowCardinality(String) CODEC(ZSTD(1)),
    `StatusMessage` String CODEC(ZSTD(1)),
    `Events.Timestamp` Array(DateTime64(9)) CODEC(ZSTD(1)),
    `Events.Name` Array(LowCardinality(String)) CODEC(ZSTD(1)),
    `Events.Attributes` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)),
    `Links.TraceId` Array(String) CODEC(ZSTD(1)),
    `Links.SpanId` Array(String) CODEC(ZSTD(1)),
    `Links.TraceState` Array(String) CODEC(ZSTD(1)),
    `Links.Attributes` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)),
    INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
    INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_span_attr_key mapKeys(SpanAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_span_attr_value mapValues(SpanAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_duration Duration TYPE minmax GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY toDate(Timestamp)
ORDER BY (ServiceName, SpanName, toUnixTimestamp(Timestamp), TraceId)
TTL toDateTime(Timestamp) + toIntervalDay(3)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1

Source :