I can also see that the EMR job that ran before the storage loader had taken twice longer than usual. This causes me to believe there was some sort of an issue while running the ETL job.
Does that make sense? Can you please tell met how might I go about re-running i?
Just guessing - but did a field in your mobile context exceed the maximum length allowed?
Not sure if that’s directly related, would have thought that would’ve been at the collector level not ETL, maybe try a character count between one event that made it through VS the one that is failing .
Thanks but I suspect that’s the not the problem. This happened with our custom context AND Snowplow’s mobile context. It seems for some reason the rows are just cut in the middle which points out to the ETL job because if it was wrongly malformed coming from the client, it would have went to “bad rows”.
It feels to me that something went wrong with the Hadoop job and it just needs re-running. I’m just not sure how to get back to that point in time (re-running ETL Runner) where the storage loader has already start running.
have you checked your enriched archive and shredded archive in s3 to see if the data was truncated during enrich vs during shred? the error you mention happens during load, so I wonder when the data got chopped off.
This happened to us again a couple of days ago. I tried deleting the enriched and shredded folders for this run, moved back the files from the raw->in bucket to the raw:prorcessing bucket and rerun the emr job with skip staging. This job then run 10 times longer than usual (11 hours instead of 1.5) which generated also 10 times the amount of data in the shredded library. Our Redshift can’t handle so much data so I had to run the storage loader with --skip download,load so we can move to processing the events which have been waiting for almost 2 days in the incoming bucket.
Did what happen make any sense to anyone? Will there be a way to recover those events at all? @alex any thoughts on this?