Android device timestamps from 2000

See my comment on github here https://github.com/snowplow/enrich/issues/15#issuecomment-662583084

But I am seeing some device created timestamps in the 2000s where the device sent timestamp ends up being correct. Seems like the device created timestamp is just the CURRENT_TIMESTAMP from sql lite. https://github.com/snowplow/snowplow-android-tracker/blob/74b982e3979318c4cef1baf6e12c6c02222736ab/snowplow-tracker/src/main/java/com/snowplowanalytics/snowplow/tracker/storage/EventStoreHelper.java#L45 so this is very strange.

We are using version "v_tracker":"andr-1.4.0", from what it looks like on the events.

Thanks for taking the convo to here @camerondavison!

For visibility, the data you shared in the issue:

dvce_created_tstamp dvce_sent_tstamp collector_tstamp
2000-12-03T22:24:08.192Z 2020-07-19T22:26:59.457Z 2020-07-19T22:27:02.897Z
2000-12-06T01:31:08.195Z 2020-07-22T03:23:02.173Z 2020-07-22T03:23:04.136Z
2000-12-02T19:43:24.928Z 2020-07-22T03:23:02.383Z 2020-07-22T03:23:04.411Z
2000-12-02T19:44:32.857Z 2020-07-22T03:23:02.383Z 2020-07-22T03:23:04.411Z
2000-12-05T00:39:01.046Z 2020-07-22T03:23:02.173Z 2020-07-22T03:23:04.136Z
2000-12-06T01:12:18.353Z 2020-07-22T03:23:02.173Z 2020-07-22T03:23:04.136Z

A couple of questions:

  • What proportion of android events look like this?
  • Is it confined to one or a small number of users?
  • Is it confined to one or a small number of sessions?
  • Are you using the timestamp() argument of the event constructors anywhere in your code?

What proportion of android events look like this?

less than 1%

Is it confined to one or a small number of users?

not as far as i can see. looks like it is all over

Is it confined to one or a small number of sessions?

no but it happens, but looking at the “collector” timestamp it does seem to happen more often in the early UTC times like 0-3

Are you using the timestamp() argument of the event constructors anywhere in your code?

no

I’ve done a little bit of a check on a few datasets and can see something similar across both the iOS and Android trackers.

It’d be somewhat explainable if it was all clustered around a single date like 2000-01-01 but it seems to be reasonably spread across the year 2000, and the dvce_sent_tstamps looks correct (in theory most of the time both timestamps should use System.currentTimeMillis()) I can find years ranging from 1900s - 2000s but there seems to be an uptick in events on YEAR-01-01 which suggests there could be a truncation / precision issue or maybe it’s just a consistent issue with incorrect clocks.

Thanks both.

I’m flagging the issue to be looked into in case there is some issue in the trackers themselves.

I’m quite unsure though - as you mention Mike, both dvce_ tstamps come from the same method & clock and so if one is incorrect we would expect to see both be incorrect.

I have seen similar confusing data before from mobile, which turned out to be down to a combination of airplane mode and incorrect clocks - consider the case where someone’s battery dies and is recharged while on airplane mode/offline. What we saw in that case was that their clock reset, events were created with odd dvce_created values, then when those users came back online their clocks synced and the data was sent with normal dvce_sent values.

I’m not sure if that explanation covers these cases. It would explain how the data seems to be randomly distributed and so rare, but as you point out Mike I would also expect to see the data converge around a date like ‘XXXX-01-01’, or some date which those devices default to.

One last thing I think it’s worth digging into on this one - what OS versions are we seeing for the data with weird timestamps?

I took a sample of them and grabbed out the .contexts_com_snowplowanalytics_snowplow_mobile_context_1_0_1[].device_model

   1 Pixel 2
   1 Pixel 3 XL
   1 SM-G960U1
   1 SM-G965U1
   1 SM-N976V
   1 VS500PP
   2 SM-G970U
   2 SM-N975U1
   2 moto g(7) power
   3 Moto Z (2)
   4 SM-G955U
   6 SM-G975U
   6 SM-G986U
   8 SM-N970U
  10 moto z3
  13 SM-N960U
  14 XT1710-02
  19 SM-G960U
  20 SM-G950U
  23 SM-G965U
 310 SM-N975U
 450 SM-N950U

looks like there was more from those last 2 but a little from many different devices.

And here is the os type/version

 393 ["android","10"]
   1 ["android","6.0.1"]
  14 ["android","8.0.0"]
 490 ["android","9"]

Hi @camerondavison ,
that’s weird. I wonder if the timestamp issue happens only to a small group of events.
Does it happen only for a specific kind of events? and if so, all of them or just some of them?

I wonder if this can be the cause or be related to the issue.
Apparently, Galaxy Note 8 updated to Android 9 had issues with the date-time. That would probably explain the incidence on those devices/OSs.

We are using structured events with contexts for everything. Is that what you are asking?

We are using structured events with contexts for everything. Is that what you are asking?

Yeah, sorry, I’ve been a bit cryptic.
I wonder if there is a common pattern among those wrong event’s timestamps in order to narrow down the source of the problem.

If those wrong timestamp come from the same type of event generated in a specific part of the app it could be caused by a .timestamp(...) call mistakenly added somewhere in the code during event building.
(we recently deprecated that method as it was cause of issues like this one)

Instead, if it isn’t related to the event type but it’s strongly connected to the device/OS type it could be caused by an issue in the clock/timezone of the device, but in that case I would expect sent_timestamp wrong too.

Another possible option might be that the timestamp is correct when the event is created but it becomes wrong when the event is stored and extracted from SQLite db used by the tracker.
This seems unlikely because I would expect it affects all the events from the same user or same session, and not just some of them randomly.

I think those events should have a common root but, at the moment, I haven’t clues that help me to figure out the possible source of the issue.

Please, let us know if you spot any further detail that can help to spot a light over this problem.

Yes, it’s strange that in some circumstances only one of the timestamps is off.

From what I can tell this occurs across both the Android and iOS trackers (multiple versions of the tracker, multiple versions of iOS / Android) across different sets of events (e.g., struct / self-describing / screen view).

On further inspection I can see that it’s also possible for this to occur in the Javascript tracker but only on certain devices (mostly Smart TVs as far as I can tell) which doesn’t eliminate SQLite as the culprit but points more towards weird clock behaviour (where the created_tstamp is off but dvce_sent is fine in some circumstances). I can also find created_tstamps several decades into the future.

Looking at some of the data it starts to feel more and more like a potential bug in either the operating system or potentially the hardware itself (some of these devices have very low cost RTCs) rather than the trackers.

We just implemented this a couple weeks ago. So I can assert that we are not using timestamp anywhere. Just grepping the code I do not see it being used.

I guess alternatively would it make sense for the enricher to be able to filter out these events? They are showing up as bad big query inserts since that table is partitioned by time and they are not allowing us to send them to big query.

Looking at some of the data it starts to feel more and more like a potential bug in either the operating system or potentially the hardware itself (some of these devices have very low cost RTCs) rather than the trackers.

Yeah I’m starting to feel this way too. While POSTGRES might be the explanation, it strikes me as super weird that it’s only that field that gets affected. Surely if postgres encoding were at play here we’d see it happen for any similarly typed fields…

I guess alternatively would it make sense for the enricher to be able to filter out these events?

From the purist point of view, I’m not sure it does - validation only checks types and specifications depicted by JSONSchema - this data is technically valid, it just has weird values. One might consider that we could have some kind of ‘minimum’ constraint here, but a big part of the philosophy is that data from any time can be reprocessed again - so in theory any such constraint conflicts a bit with design. So while I see the point of view, personally I’d be hesitant to do this in the core validation of the pipeline (if it were my choice at least).

From the practical point of view I think it’s an absolutely fair ask and it’s a good idea - custom validation rules are an idea that has been raised to us before in a different context, and that idea is one that we’re enthusiastic about. At the moment, it’s not on the roadmap, as we have a lot of more immediate delivery - so it’s still a question of if not when. (For the record, I’m not involved in making roadmap decisions, I’m just giving my opinion on how things look wrt that feature). Personally I’m super excited about the idea though!

They are showing up as bad big query inserts since that table is partitioned by time and they are not allowing us to send them to big query.

I would say that this is not intuitive but is actually the expected behaviour. If you’d like to work around it, setting the true_tstamp will override the derved_tstamp to be what you set it to (derived_tstamp is normally calculated using dvce_created, dvce_sent and collector tstamps).

Additionally, as I was responding to this I realised that there’s potentially a good solution of the general case of this problem so I’ve created an issue on that topic.

Best,