We are facing an issue with scala stream collector. We are using scala stream collector with kafka to track events. The issue is that collector_tstamp (in fact all timestams) are in UTC and we are not able to change it to our local timezone.
Is there any way of achieving this? We tried rebuilding the jar from source but that didn’t work either.
Hi @rahul although I cannot offer any suggestions on how to change the collector timestamp (because I don’t know enough), I am going to speculate that (as is usually the case with local time requirements) you convert at the presentation, rather than collection, processing or storage stages. Specifically when you use your data, you can use a library (or SQL function) to convert the proper time to a local time using location information and add a column with a local representation of the proper time. Would that work for you?
@knservis thanks for the reply. Though your suggestion makes sense but for our use case where we have all our systems in our local timezone and all the events have local timezone at source only. We have snowplow batch pipeline on production running and that too have the our local timezone. So we would like to have the consistency across the systems.
So for our use case what can we do to have collector_tstamp in our local timezone at source?
I’m in agreeance with @knservis here - you should be keeping collector_tstamp in UTC time and not local time.
At the moment the Scala stream collector uses System.currentTimeMillis as the collector_tstamp for an event and although it’s possible to modify this there are several reasons not to including
You would also have to make reference changes to anywhere bad rows are being collected (where failure_tstamp is in UTC)
Certain enrichments (e.g., currency conversion) rely on UTC timestamps
Local time is more difficult to reason about - the current schema for atomic.events sets collector_tstamp to TIMESTAMP rather than TIMESTAMPTZ. Local time doesn’t always progress forward at 1 second per second (time offsets change) whereas this is mostly guaranteed with UTC.
The collector_tstamp doesn’t help much in determining when an event occurred as it’s only indicative of when an event was received by a collector. @alex has written a comprehensive blog post on how time is calculated and processed and what timestamps are appropriate to use in what circumstances.
If you do want to convert the collector_tstamp to local time you can do this in most databases as @knservis has mentioned by using CONVERT_TIMEZONE e.g., CONVERT_TIMEZONE('Timezone/Example', collector_tstamp). It’s easy to add this as part of the data modelling Redshift that runs every ETL run using something like SQL runner or any other automation tool.