Docker Images for running Snowplow locally

Hey, I was looking to run the snowplow pipeline locally with postgres using Docker compose. Tried a few files I found on github but it seems the certificate was expired on those. Just help me out I’m new to docker too. Are there any new images released the blogpost I found was dated 2017 and those images were probably the ones with certificate errors. please help.

Hi @ishaan812,

I’m afraid right now none of the warehouse/database loaders runs locally*, as all of them depend on either Kinesis or Pub/Sub. This is something we are looking to improve, but in the meantime I think your options are:

  • Wait until we add a database loader to Snowplow Micro. With everything on the roadmap, this might take a few months.
  • Implement a Kafka input in the Postgres loader (since the code is open source).

If you are interested in the second option, we can give you some guidance, although I would warn that the codebase is not the easiest to follow, and requires a fair bit of Scala knowledge.

*There is one exception to this — the Postgres loader is able to load data from local files, although it only does it once per run (rather than continuously scanning the files). So you could run Snowplow Micro with TSV output, store the output in a file, and then run the Postgres loader in docker to get that data into Postgres. See the input.type and input.path configuration options.

hmm, for now we are just going with a Golang loader that we have bootlegged. Just stores data to the postgres table we’ve made. Extended support for micro would be great, I think that will make the learning curve easier. Do you recommend hosting a Snowplow Pipeline on Kubernetes/ Have seen other people doing it?, we wanted to do this specifically to be cloud agnostic for now. We kind of have a working thing with the current Micro setup with Postgres but its not scalable :confused:

We are working on a Kubernetes setup with a cloud-agnostic messaging framework to replace Kinesis and Pub/Sub. But this will still take some months.

Maybe some Open Source users can chime in with their experience running the pipeline (as is) in Kubernetes.

1 Like

@andnig has posted a really comprehensive Kubernetes setup guide. It still uses GCP Pub/Sub for the connectivity bits. Worth a read. And great work Andreas.