Data modeling using Map Reduce

It’s worth having a read of the recent RFC from @alex on porting the Snowplow pipeline to GCP as this makes more of a move away from Lambda in terms of moving stream processing into Beam/Dataflow rather than having to rely on something like EMR.

There are still some tricky issues around streaming (like deduplication and exactly once semantics) but it certainly looks like it’s an interesting way forward for analytics infrastructure.

1 Like