Snowplow Docker images released

We’re extremely happy to be releasing the first official Snowplow Docker images:

This release is focused around providing images for the real-time pipeline components:

  • Scala Stream Collector
  • Stream Enrich
  • Snowplow Elasticsearch Loader
  • Snowplow S3 Loader

Huge thanks to @joshuacox, @danielz and Tamas Szuromi.


nice, I would really like to use docker for the batch pipeline, so that I can start, update, rerun pipelines via chatops

@tclass there is a PR for having an image for EmrEtlRunner if you want to have a look:

1 Like

I’m running both the stream collector and the s3 loader as docker images. The collector works fine, but the loader dies immediately upon startup with a SIGSEGV. Has anyone encountered this?

$ docker run --rm -v /opt/snowplow/conf.d:/snowplow/config --config /snowplow/config/config.hocon
log4j:WARN No appenders could be found for logger (com.amazonaws.AmazonWebServiceClient).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See for more info.
[main] INFO com.snowplowanalytics.s3.loader.sinks.KinesisSink - Stream snowplow-collector-cluster-production-FailedLoadStream-V64WO3F20BSJ exists and is active
# A fatal error has been detected by the Java Runtime Environment:
#  SIGSEGV (0xb) at pc=0x00000000000010d6, pid=5, tid=0x00007fc576b4eae8
# JRE version: OpenJDK Runtime Environment (8.0_131-b11) (build 1.8.0_131-b11)
# Java VM: OpenJDK 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea 3.4.0
# Distribution: Custom build (Fri Jun 16 13:41:54 GMT 2017)
# Problematic frame:
# C  0x00000000000010d6
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
# An error report file with more information is saved as:
# /tmp/hs_err_pid5.log
# If you would like to submit a bug report, please include
# instructions on how to reproduce the bug and visit:

I have a custom docker image of the same version (0.6.0) of s3-loader that appears to work fine. It uses openjdk:8u141 as a base (so not sure if it’s an alpine vs debain issue).

Thanks for any help!

The lzo package is not part of the base alpine install and is not part of the image since it’s not necessary for gzip.

There are two solutions to this issue:

  • Creating a derived image and installing the lzo package
  • Modifying the entrypoint script to install lzo

I have created a ticket to document this.

Thanks. As an aside, I had assumed that I needed to write S3 data in LZO, which I get the impression is wrong. It seemed strange at the time that the clojure collector data is clearly gzipped. But, that was my assumption based on the wording of this comment in the example configuration:

I think the above might be an artifact from a previous implementation.

I created to track the issue.