Hey all,
Strange issue here, have install Snowplow collector, enrich, redshift, s3-sink, rdb-loader with TerraForm and docker containers (AWS Fargate)
To my understanding to run RDB-Loader (post 35) you recommend either dataflow runner or a boto3 script (R35 Upgrade Guide - Snowplow Docs)
For now due to external factors i can’t use a boto 3 script (or Lambda for that matter)
So instead of a ec2 server to run the script i planed to run dataflow-runner in a simple docker container
Based on the documentation if i use a Linux 64 i should be able to run the dataflow runner with no other dependencies.
Although i have no issues to get the container up and running and got a bash script as a launch script (using that for all docker images) I cant get dataflow-runner to run. Response i get is “not found” even though i have verified that its executable and I got the right permissions, some suggestions on stack is that this is because I am missing some dependencies. Have anyone got dataflow runner to work in a docker image? or any ideas would be welcome… Thanks F
Could you share your script running dataflow runner please ? Have you tried doing all the steps manually on a Docker container to make sure that it’s working ?
FROM snowplow/base-alpine as builder
#RUN apk update && apk upgrade && apk add bash && apk add bash-completion
WORKDIR /snowplow
COPY launch.sh /snowplow/launch.sh
COPY playbook.json /snowplow/playbook.json
COPY cluster.json /snowplow/cluster.json
RUN wget http://dl.bintray.com/snowplow/snowplow-generic/dataflow_runner_0.5.0_linux_amd64.zip
RUN unzip dataflow_runner_0.5.0_linux_amd64.zip
FROM snowplow/base-alpine
RUN apk update && apk upgrade && apk add bash
WORKDIR /snowplow
COPY --from=builder /snowplow /snowplow
RUN chmod +x launch.sh
#RUN chown snowplow:snowplow launch.sh
#RUN echo ${PATH}
#RUN ls -la
ENTRYPOINT [ "./launch.sh" ]
launch.sh is
#!/bin/bash
echo "in script"
ls -la
pwd
./dataflow-runner help
#./dataflow-runner run-transient --emr-config=cluster.json --emr-playbook=playbook.json
#run-transient Launches, runs and then terminates an EMR cluster
And finally the output from the script part is
snowplow-dataflow-runner | in script
snowplow-dataflow-runner | ./launch.sh: line 5: ./dataflow-runner: not found
snowplow-dataflow-runner | total 28652
snowplow-dataflow-runner | drwxr-xr-x 1 snowplow snowplow 4096 Feb 20 15:17 .
snowplow-dataflow-runner | drwxr-xr-x 1 root root 4096 Feb 20 15:17 ..
snowplow-dataflow-runner | drwxr-xr-x 1 snowplow snowplow 4096 Oct 29 15:47 bin
snowplow-dataflow-runner | -rw-r--r-- 1 root root 1987 Feb 17 17:43 cluster.json
snowplow-dataflow-runner | drwxr-xr-x 2 snowplow snowplow 4096 Oct 29 15:47 config
snowplow-dataflow-runner | -rwxr-xr-x 1 root root 20789708 Feb 20 15:17 dataflow-runner
snowplow-dataflow-runner | -rw-r--r-- 1 root root 8518063 Aug 24 15:55 dataflow_runner_0.5.0_linux_amd64.zip
snowplow-dataflow-runner | -rwxr-xr-x 1 root root 214 Feb 20 15:17 launch.sh
snowplow-dataflow-runner | -rw-r--r-- 1 root root 1483 Feb 17 18:23 playbook.json
snowplow-dataflow-runner | /snowplow
snowplow-dataflow-runner exited with code 127
Hopefully something simple I am missing but tried chaining directories, permissions and path etc.
Any insight is welcome …
Hey Ben,
Many thanks for the help, it turns out that the “linux-vdso.so.1” is not in the kernel of base-alpine so updated my docker to use base-debian, when running the code now it prints out the help command…
Many thanks !
I’m facing the same problem, and it really works with the base-debian, but is there any smaller image that could run the dataflow-runner?
I’ve tried busybox:stable-glibc, but it wasn’t enough.
Yes it works thanks!
I’ve added it to the make command instead of tempering with the Makefile content:
make cli-linux -e CGO_ENABLED=0
now its able 2 run on alpine:3.15 using multi stage build
adding the Dockerfile
FROM golang:bullseye as Builder
WORKDIR /src
RUN apt update
RUN apt install unzip
RUN apt install zip
ENV DATAFLOW_RUNNER_VERSION=0.5.1
RUN wget https://github.com/snowplow/dataflow-runner/archive/refs/tags/${DATAFLOW_RUNNER_VERSION}.zip
RUN unzip *
RUN make --directory=dataflow-runner-${DATAFLOW_RUNNER_VERSION} cli-linux -e CGO_ENABLED=0
RUN unzip dataflow-runner-${DATAFLOW_RUNNER_VERSION}/build/bin/*.zip
FROM alpine:3.15
WORKDIR /snowplow
COPY ./config /snowplow/config
COPY --from=Builder /src/dataflow-runner ./
ENTRYPOINT ./dataflow-runner run-transient --emr-config ./config/cluster.json --emr-playbook ./config/playbook.json --vars JSON_RESOLVER_VALUE,${RESOLVER_BASE64},CONFIG_HOCON_VALUE,${CONFIG_HOCON_BASE64},ENV_VAR,${ENVIRONMENT}