We’ve got a relative new snowplow deployment (2 weeks in prod) close to out of the box config with scala collector -> kinesis -> s3 sink -> emr enrich and shred -> load to redshift.
We’ve had a couple of instances where the redshift load fails. It fails because of an invalid negative value for derived_tstamp.
e.g. derived_tstamp=-5967-09-11 11:10:28.569
Can someone tell me why this might of happened and how to stop it happening?
Hi @alex, the timestamps are below. Looks like the device sent timestamp is bizarrely out of whack with the device created timestamp. Not sure why that would be, the browser user-agent looks like chrome on win7.
That’s interesting - the javascript library just uses new Date().getTime() to set dvce_sent_tstamp (stm) which should just be an epoch. Which version of the Javascript library are you running?
We’ve got hundreds of million of records in the event table and only 2 instances of this happening. So it doesn’t seem like a typical issue, possibly someone manually messy with us and I’d be happy to drop the record or set the derived_tstamp to the same as the collector_tstamp. The main issue is that one bad record that could be spoofed from the browser will break the etl job. Can we put some enforcement to make sure the derived tstamp is valid for loading in redshift?
Sorry for replying to this old post, but will there be a filter mechanism or something similar implemented in the future?
This would be really appreciated by my team