A bit of context:
In the project that I’m working on we are already calculating some metrics (in “realtime”) from web events that are generated by a snowplow web tracker. Let’s focus on AvgTimeSpent.
We are using Spark Structured Steaming to perform those calculations. In those jobs, events with the same pageview_id are grouped in windows of 1 minutes. Then that pre-aggregation is used to create 5 minute windows that are finally used to calculate the metrics.
Here is a simplified example:
Incoming events:
ts ev_type page pv_id
12:00:00 page_view /home 0001
12:00:10 page_ping /home 0001
12:00:20 page_ping /home 0001
12:00:22 page_view /home 0002
12:00:32 page_ping /home 0002
1 minute aggregation:
ts page time_spent pv_id
12:00:00 /home 20 0001
12:00:00 /home 10 0002
Average time spent:
ts page avg_time_spent
12:00:00 /home 15
The calculation of these metrics is possible because in web there are page pings, therefore a 5 minute window has enough information to determine how many pageviews ocurred and how much time a user spent on a page, on that 5 minute interval.
Since in mobile trackers there is not support for page pings, I wonder if anyone have calculated similar realtime metrics for mobile events
I’m not looking for a solution specific for Spark Structured Streaming, we could well be using Flink or something else.
I’m more interested in the approach used to perform the calculation. Things like:
- Which events did you configure on the mobile tracker?
- How do you use those events to calculated your metrics?
Any help is appreciated.