Snowplow Ruby Tracker with Async Emitter

lucasas · May 17, 2017, 9:21pm

Hi,

I’m Lucas Souza, CTO of GetNinjas.

Currently, we are using a complex architecture in order to avoid the use of snowplow ruby tracker with the async emitter. So why do we do that?

Basically, because we have a Rails application running on a Unicorn web server. Since Unicorn works with different process and those can be killed when hitting a giving timeout, we are afraid to lose some events.

Our current architecture involves sending events to a file, a process that reads that file using Fluentd, sends those events to SQS and finally getting them and sending to Snowplow using a multi-threaded application written in Ruby.

It’s important to say real-time events are a prerequisite for us. We are not reaching it with this architecture (with a lot of steps), and actually, I think are too many steps in reality.

We would like to know if one of you have any better idea to solve that problem? Did you try Ruby Async Emitter with Rails Application running under Unicorn? Or do you have another architecture in mind to solve this real-time event for backend applications problem?

Best,

alex · May 17, 2017, 9:33pm

Hi Lucas,

That does sound like a complicated pipeline! We have been mulling a couple of alternative collection architectures recently - primarily for the PHP Tracker but it sounds like they could work well for Rails/Unicorn.

Option 1: Socket collector

Adding a socket emitter to a given tracker
Writing a socket collector (probably in Golang or Rust) which listens on the socket and writes the events to Kinesis/Kafka/maybe S3, in our standard format
Obviously the socket collector stays behind your firewall

Option 2: Golang tracking daemon

Adding a socket emitter to a given tracker
Writing a Golang daemon that runs on each box
The Golang daemon embeds our Snowplow Golang Tracker
The Golang daemon will cache events in e.g. RocksDB
The Golang daemon will then send the events out to the regular HTTP collector

Do either of these sound interesting - does the community have some other ideas?

lucasas · May 18, 2017, 4:17pm

Hi Alex,

Thank you for your fast reply,

I liked the second option more than the first one. But I have some questions:

Which socket emitter do you have in mind? I saw some implementations using Redis behind of scenes.
Snowplow Golang Tracker does not have a batch option?

Best,

lucasas · May 18, 2017, 5:37pm

Another option is still using Fluentd to read trackings files and coding a Fluentd Output Plugin to send events to Snowplow.

What do you think?

alex · May 18, 2017, 9:13pm

If you are already invested in Fluentd, then yes that option could work too; I don’t think you’d want to introduce Fluentd just for this use case though.

On the socket emitter - I just mean writing low-level TCP socket code to emit the events.

The Golang Tracker would be extended and then embedded in a long-running daemon which would handle the batching, storing and sending of Snowplow events. Some ASCII art:

Rails process + Snowplow Ruby Tracker ---socket--> Golang daemon + Golang Tracker ---http--> Snowplow collector

lucasas · May 18, 2017, 9:58pm

Thanks for clarifying everything Alex,

Don’t you think to implement a Socket Emitter inside Snowplow Ruby Tracker is a kind of overhead? I mean, for me looks easier just sending events through TCP connection to a Golang Daemon listening to it, formatting those messages on Snowplow Pattern, storing it on RocksDB and, finally, sending it using HTTP collector:

Rails + TCP Socket (logstash, for example) --> Golang Daemon + Golang Tracker --> Snowplow HTTP Collector

What do you think?

alex · May 18, 2017, 10:06pm

Are you talking performance overhead or cognitive overhead? I think the cognitive overhead of adding socket support to the Ruby Tracker is lower, because it means your client code is instrumented in the same way - using the standard Ruby Tracker API - whether you use a socket emitter or an HTTP emitter.

Performance-wise, I don’t see why there’d be any impact if the socket emitter was bundled as part of the Ruby Tracker versus being hand-rolled…

Maybe I’m misunderstanding your point?

lucasas · May 19, 2017, 1:07am

You got it. I was talking about cognitive overhead.

I agree with you when you say the client code is instrumented in the same way.

But, my point is: does it make sense having a socket which will transfer data in snowplow format and a golang daemon listening and transforming it before send to http collector?

Best,

alex · May 19, 2017, 9:38am

I guess we are saying that there are two options:

Transfer the data in some kind of raw format to the Golang daemon, and the Golang daemon turns this format into the Snowplow Tracker Protocol
Transfer the data, already in Snowplow Tracker Protocol format, to the Golang daemon, and the Golang daemon passes through the payloads as-is

Option 1 is slightly more efficient payload-wise, but it means that we have to create, document and maintain YAPF (yet another payload format), which isn’t something we have bandwidth to do. So I’d vote for option 2.

lucasas · May 19, 2017, 11:10am

Agree with you.

By the way, I’m trying Fluentd option. Today I’ll bring results about it.

But, definitely, if this won’t scale, your idea looks very promising.

alex · May 19, 2017, 1:39pm

Cool, keep us posted how you get on!

Topic		Replies	Views
Server-side infrastructure Tracking SDKs	4	1451	February 2, 2017
Snowplow Golang Tracker released New releases	6	1607	April 28, 2016
Snowplow Ruby Tracker 0.8.0 released New releases	0	1406	November 1, 2021
Issues with Ruby Tracker on Ruby 2.5.5 Troubleshooting	2	1103	September 10, 2020
Snowplow Java Tracker v1.0.0 released New releases	0	830	September 7, 2022

Snowplow Ruby Tracker with Async Emitter

Option 1: Socket collector

Option 2: Golang tracking daemon

Related topics