I’ve done a little bit of a check on a few datasets and can see something similar across both the iOS and Android trackers.
It’d be somewhat explainable if it was all clustered around a single date like 2000-01-01 but it seems to be reasonably spread across the year 2000, and the dvce_sent_tstamps looks correct (in theory most of the time both timestamps should use System.currentTimeMillis()) I can find years ranging from 1900s - 2000s but there seems to be an uptick in events on YEAR-01-01 which suggests there could be a truncation / precision issue or maybe it’s just a consistent issue with incorrect clocks.
I’m flagging the issue to be looked into in case there is some issue in the trackers themselves.
I’m quite unsure though - as you mention Mike, both dvce_ tstamps come from the same method & clock and so if one is incorrect we would expect to see both be incorrect.
I have seen similar confusing data before from mobile, which turned out to be down to a combination of airplane mode and incorrect clocks - consider the case where someone’s battery dies and is recharged while on airplane mode/offline. What we saw in that case was that their clock reset, events were created with odd dvce_created values, then when those users came back online their clocks synced and the data was sent with normal dvce_sent values.
I’m not sure if that explanation covers these cases. It would explain how the data seems to be randomly distributed and so rare, but as you point out Mike I would also expect to see the data converge around a date like ‘XXXX-01-01’, or some date which those devices default to.
One last thing I think it’s worth digging into on this one - what OS versions are we seeing for the data with weird timestamps?
Hi @camerondavison ,
that’s weird. I wonder if the timestamp issue happens only to a small group of events.
Does it happen only for a specific kind of events? and if so, all of them or just some of them?
I wonder if this can be the cause or be related to the issue.
Apparently, Galaxy Note 8 updated to Android 9 had issues with the date-time. That would probably explain the incidence on those devices/OSs.
We are using structured events with contexts for everything. Is that what you are asking?
Yeah, sorry, I’ve been a bit cryptic.
I wonder if there is a common pattern among those wrong event’s timestamps in order to narrow down the source of the problem.
If those wrong timestamp come from the same type of event generated in a specific part of the app it could be caused by a .timestamp(...) call mistakenly added somewhere in the code during event building.
(we recently deprecated that method as it was cause of issues like this one)
Instead, if it isn’t related to the event type but it’s strongly connected to the device/OS type it could be caused by an issue in the clock/timezone of the device, but in that case I would expect sent_timestamp wrong too.
Another possible option might be that the timestamp is correct when the event is created but it becomes wrong when the event is stored and extracted from SQLite db used by the tracker.
This seems unlikely because I would expect it affects all the events from the same user or same session, and not just some of them randomly.
I think those events should have a common root but, at the moment, I haven’t clues that help me to figure out the possible source of the issue.
Please, let us know if you spot any further detail that can help to spot a light over this problem.
Yes, it’s strange that in some circumstances only one of the timestamps is off.
From what I can tell this occurs across both the Android and iOS trackers (multiple versions of the tracker, multiple versions of iOS / Android) across different sets of events (e.g., struct / self-describing / screen view).
On further inspection I can see that it’s also possible for this to occur in the Javascript tracker but only on certain devices (mostly Smart TVs as far as I can tell) which doesn’t eliminate SQLite as the culprit but points more towards weird clock behaviour (where the created_tstamp is off but dvce_sent is fine in some circumstances). I can also find created_tstamps several decades into the future.
Looking at some of the data it starts to feel more and more like a potential bug in either the operating system or potentially the hardware itself (some of these devices have very low cost RTCs) rather than the trackers.
We just implemented this a couple weeks ago. So I can assert that we are not using timestamp anywhere. Just grepping the code I do not see it being used.
I guess alternatively would it make sense for the enricher to be able to filter out these events? They are showing up as bad big query inserts since that table is partitioned by time and they are not allowing us to send them to big query.
Looking at some of the data it starts to feel more and more like a potential bug in either the operating system or potentially the hardware itself (some of these devices have very low cost RTCs) rather than the trackers.
Yeah I’m starting to feel this way too. While POSTGRES might be the explanation, it strikes me as super weird that it’s only that field that gets affected. Surely if postgres encoding were at play here we’d see it happen for any similarly typed fields…
I guess alternatively would it make sense for the enricher to be able to filter out these events?
From the purist point of view, I’m not sure it does - validation only checks types and specifications depicted by JSONSchema - this data is technically valid, it just has weird values. One might consider that we could have some kind of ‘minimum’ constraint here, but a big part of the philosophy is that data from any time can be reprocessed again - so in theory any such constraint conflicts a bit with design. So while I see the point of view, personally I’d be hesitant to do this in the core validation of the pipeline (if it were my choice at least).
From the practical point of view I think it’s an absolutely fair ask and it’s a good idea - custom validation rules are an idea that has been raised to us before in a different context, and that idea is one that we’re enthusiastic about. At the moment, it’s not on the roadmap, as we have a lot of more immediate delivery - so it’s still a question of if not when. (For the record, I’m not involved in making roadmap decisions, I’m just giving my opinion on how things look wrt that feature). Personally I’m super excited about the idea though!
They are showing up as bad big query inserts since that table is partitioned by time and they are not allowing us to send them to big query.
I would say that this is not intuitive but is actually the expected behaviour. If you’d like to work around it, setting the true_tstamp will override the derved_tstamp to be what you set it to (derived_tstamp is normally calculated using dvce_created, dvce_sent and collector tstamps).
Additionally, as I was responding to this I realised that there’s potentially a good solution of the general case of this problem so I’ve created an issue on that topic.