Derived_tstamp is negative

lionelport · July 26, 2016, 12:12am

Hi,

We’ve got a relative new snowplow deployment (2 weeks in prod) close to out of the box config with scala collector -> kinesis -> s3 sink -> emr enrich and shred -> load to redshift.

We’ve had a couple of instances where the redshift load fails. It fails because of an invalid negative value for derived_tstamp.

e.g. derived_tstamp=-5967-09-11 11:10:28.569

Can someone tell me why this might of happened and how to stop it happening?

alex · July 26, 2016, 2:17pm

Hi @lionelport - thanks for raising this. The design of our derived_tstamp is explained in the blog post, Improving-Snowplow’s Understanding of Time.

For one of the events which is failing to load, could you share:

device_created_tstamp
device_sent_tstamp
collector_tstamp

Thanks!

lionelport · July 27, 2016, 4:17am

Hi @alex, the timestamps are below. Looks like the device sent timestamp is bizarrely out of whack with the device created timestamp. Not sure why that would be, the browser user-agent looks like chrome on win7.

etl_tstamp=2016-07-24 19:07:23.018
collector_tstamp=2016-07-24 17:16:27.024
dvce_created_tstamp=2016-07-19 22:08:14.652
dvce_sent_tstamp=9999-06-02 04:14:13.107

mike · July 27, 2016, 9:01am

That’s interesting - the javascript library just uses new Date().getTime() to set dvce_sent_tstamp (stm) which should just be an epoch. Which version of the Javascript library are you running?

lionelport · July 29, 2016, 2:30am

We’re running JS 2.6.2.

We’ve got hundreds of million of records in the event table and only 2 instances of this happening. So it doesn’t seem like a typical issue, possibly someone manually messy with us and I’d be happy to drop the record or set the derived_tstamp to the same as the collector_tstamp. The main issue is that one bad record that could be spoofed from the browser will break the etl job. Can we put some enforcement to make sure the derived tstamp is valid for loading in redshift?

alex · July 29, 2016, 10:42am

Hi @lionelport - currently all you can do is bump maxerror on the Redshift load configuration to a number like 10 or better 1000.

froehr · December 9, 2016, 10:58am

Sorry for replying to this old post, but will there be a filter mechanism or something similar implemented in the future?
This would be really appreciated by my team

Topic		Replies	Views
Dvce_created_tstamp with future datetimes values For engineers	2	1774	September 21, 2018
Avg. Time per Visit: collector vs dvce timestamp? For data modelers & consumers	3	1641	June 6, 2016
Redshift loading error: null byte - field longer than 1 byte Storage targets	9	5274	November 14, 2017
Java Tracker Issue : event time-stamp get lost Tracking SDKs	5	1313	February 27, 2018
Node.js tracker does not support true timestamp Tracking SDKs	1	1116	April 28, 2020

Derived_tstamp is negative

Related topics