Snowplow Docker images released

BenFradet · October 13, 2017, 1:20pm

We’re extremely happy to be releasing the first official Snowplow Docker images:

This release is focused around providing images for the real-time pipeline components:

Scala Stream Collector
Stream Enrich
Snowplow Elasticsearch Loader
Snowplow S3 Loader

Huge thanks to @joshuacox, @danielz and Tamas Szuromi.

tclass · October 13, 2017, 2:14pm

nice, I would really like to use docker for the batch pipeline, so that I can start, update, rerun pipelines via chatops

BenFradet · October 16, 2017, 8:44am

@tclass there is a PR for having an image for EmrEtlRunner if you want to have a look:

rbolkey · October 27, 2017, 5:37pm

I’m running both the stream collector and the s3 loader as docker images. The collector works fine, but the loader dies immediately upon startup with a SIGSEGV. Has anyone encountered this?

$ docker run --rm -v /opt/snowplow/conf.d:/snowplow/config snowplow-docker-registry.bintray.io/snowplow/s3-loader:0.6.0 --config /snowplow/config/config.hocon
log4j:WARN No appenders could be found for logger (com.amazonaws.AmazonWebServiceClient).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[main] INFO com.snowplowanalytics.s3.loader.sinks.KinesisSink - Stream snowplow-collector-cluster-production-FailedLoadStream-V64WO3F20BSJ exists and is active
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00000000000010d6, pid=5, tid=0x00007fc576b4eae8
#
# JRE version: OpenJDK Runtime Environment (8.0_131-b11) (build 1.8.0_131-b11)
# Java VM: OpenJDK 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea 3.4.0
# Distribution: Custom build (Fri Jun 16 13:41:54 GMT 2017)
# Problematic frame:
# C  0x00000000000010d6
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid5.log
#
# If you would like to submit a bug report, please include
# instructions on how to reproduce the bug and visit:
#   http://icedtea.classpath.org/bugzilla
#

I have a custom docker image of the same version (0.6.0) of s3-loader that appears to work fine. It uses openjdk:8u141 as a base (so not sure if it’s an alpine vs debain issue).

Thanks for any help!

BenFradet · October 30, 2017, 12:08pm

The lzo package is not part of the base alpine install and is not part of the image since it’s not necessary for gzip.

There are two solutions to this issue:

Creating a derived image and installing the lzo package
Modifying the entrypoint script to install lzo

I have created a ticket to document this.

rbolkey · October 31, 2017, 4:30pm

Thanks. As an aside, I had assumed that I needed to write S3 data in LZO, which I get the impression is wrong. It seemed strange at the time that the clojure collector data is clearly gzipped. But, that was my assumption based on the wording of this comment in the example configuration:

github.com

snowplow/snowplow-s3-loader/blob/3f56e5ee7c516c4e47900b3da27946356f18a765/examples/config.hocon.sample#L87


  recordLimit = {{bufferRecordThreshold}}
  timeLimit = {{bufferTimeThreshold}} # Not supported by NSQ; will be ignored
}
}


s3 {
region = "{{s3Region}}"
bucket = "{{s3bucket}}"


# Format is one of lzo or gzip
# Note, that you can use gzip only for enriched data stream.
format = "{{format}}"


# Maximum Timeout that the application is allowed to fail for
maxTimeout = {{maxTimeout}}
}


# Optional section for tracking endpoints
monitoring {
snowplow{
  collectorUri = "{{collectorUri}}"

BenFradet · November 1, 2017, 10:38am

I think the above might be an artifact from a previous implementation.

I created https://github.com/snowplow/snowplow-s3-loader/issues/121 to track the issue.

Topic		Replies	Views
Snowplow S3 Loader 2.2.0 released New releases	0	768	May 19, 2022
Snowplow docker s3-load work incorrect AWS real-time pipeline	6	2727	January 3, 2022
S3 Loader 2.2.4 released New releases	0	728	November 4, 2022
Snowplow Docker images R8 New releases	0	760	August 6, 2018
Docker Images for running Snowplow locally Kafka real-time pipeline	4	1004	March 1, 2023

Snowplow Docker images released

Related topics