Move timezone detection to backend

christoph-buente · November 28, 2018, 9:59am

When we looked into ways to minimize the size of the JavaScript tracker, we found out it carries a large lookup table to detect the browsers timezone. I think this can be clearly moved into the backend, as the timezone information itself is not being used in the tracker at all. There is a closed PR that removes the lookup table, but without the corresponding functionality in the backend, it does no good.

What would be the ideal place to fill the timezone field instead of the tracker itself? Would it be the collector as the first touchpoint? Or would it be a build in enrichment?

Calling you all for opinions. Thanks

robkingston · November 29, 2018, 1:10am

If anything, an enrichment seems more sensible here. Adding enrichment logic to the collector just feels messy.

It will probably need a new parameter in the tracker protocol: https://github.com/snowplow/snowplow/wiki/snowplow-tracker-protocol#timestamp

And would we need to maintain hash maps across tracker implementations and enrichment? e.g. new time zones / changes to DST. What if they go out of sync or we have to re-enrich old data?

Should we consider letting enrichment just figure it out based on raw data instead?

mike · November 29, 2018, 1:53am

I wouldn’t put it in the collector because that’s just too tricky but I think it makes some sense in the enrichment. That said I wonder if we just replace the heavy lookup tables in jstimezonedetect with calls to the Intl API e.g.,

Intl.DateTimeFormat().resolvedOptions().timeZone

it’s supported enough now in browsers with the exception of UC browser which has a pretty heavy usage in China (and China only has one timezone). I’m not sure about the coverage within Android webview which could be an issue.

The only reason I don’t like the idea of putting this in the enricher is that this feels like this information should be provided by the underlying OS / browser and not inferred from a timezone offset. Having this as an enrichment would also mean having to maintain / update the underlying IANA databases which would have to be pulled from S3 or a similar service.

christoph-buente · November 29, 2018, 8:55am

The reason why i think it cannot be an enrichment in the classic sense is, that enrichments cannot change any fields inside the canonical model, except the derived contexts. So it either will be a built-in enrichment, that is on by default and does not adhere to the rule. Or it is done implicitly inside the collector.

I like the idea to detect the timezone inside the browser with native support. But still we have to backfill that information for browser that lack support of the Intl API.

I would not be so worried about downloading a file from S3, as the mechanism exists for other parts of the pipeline. This will be yet another hosted asset i guess.

Topic		Replies	Views
Need to collect the data in the EST timezone using the dbt-snowplow-web package Collectors	4	643	August 30, 2023
How to send true timestamp using the ttm parameter? Tracking SDKs	1	1883	April 19, 2016
Changing timezone of collector timestamp in scala stream collector Collectors	3	2001	March 20, 2018
Data not processing in Stream Enrich Enrichment	0	1326	July 27, 2018
What timezones are the timestamps set in? For data modelers & consumers	4	3562	October 25, 2018

Move timezone detection to backend

Related topics