Storage Loader "Incomplete JSON object found"

danielz · November 14, 2016, 3:41pm

Hi guys!

Last night, our Storage Loader job failed for the first time with on loading one of our custom contexts with

INFO - COMMIT;: ERROR: Load into table 'com_xxx_1' failed.  Check 'stl_load_errors' system table for details.

Looking at Redshift’s stl_load_errors log I see errors with the following message - Incomplete JSON object found.

I downloaded the offending files, and indeed, those files are cut at the end, having partial json present.

I looked through the forum and couldn’t find similar reports. Can you please advise what we could do to fix that?

Thanks in advance,
Daniel

danielz · November 15, 2016, 12:33pm

I moved most of the files away just to get it running but now I see the same issue with com_snowplowanalytics_snowplow_mobile_context_1

Maybe something went wrong with the ETL job and it should be run again? any help would be greatly appreciated.

danielz · November 15, 2016, 12:56pm

I can also see that the EMR job that ran before the storage loader had taken twice longer than usual. This causes me to believe there was some sort of an issue while running the ETL job.

Does that make sense? Can you please tell met how might I go about re-running i?

13scoobie · November 15, 2016, 2:58pm

Just guessing - but did a field in your mobile context exceed the maximum length allowed?

Not sure if that’s directly related, would have thought that would’ve been at the collector level not ETL, maybe try a character count between one event that made it through VS the one that is failing .

danielz · November 15, 2016, 3:28pm

Thanks but I suspect that’s the not the problem. This happened with our custom context AND Snowplow’s mobile context. It seems for some reason the rows are just cut in the middle which points out to the ETL job because if it was wrongly malformed coming from the client, it would have went to “bad rows”.

It feels to me that something went wrong with the Hadoop job and it just needs re-running. I’m just not sure how to get back to that point in time (re-running ETL Runner) where the storage loader has already start running.

13scoobie · November 15, 2016, 4:27pm

have you checked your enriched archive and shredded archive in s3 to see if the data was truncated during enrich vs during shred? the error you mention happens during load, so I wonder when the data got chopped off.

danielz · December 17, 2016, 4:11pm

This happened to us again a couple of days ago. I tried deleting the enriched and shredded folders for this run, moved back the files from the raw->in bucket to the raw:prorcessing bucket and rerun the emr job with skip staging. This job then run 10 times longer than usual (11 hours instead of 1.5) which generated also 10 times the amount of data in the shredded library. Our Redshift can’t handle so much data so I had to run the storage loader with --skip download,load so we can move to processing the events which have been waiting for almost 2 days in the incoming bucket.

Did what happen make any sense to anyone? Will there be a way to recover those events at all?
@alex any thoughts on this?

danielz · December 18, 2016, 8:42am

Ok, I just realized that instead copying the day’s original files I copied the whole archive bucket (10 days history).

I’ll try to copy only there relevant files and run again

Topic		Replies	Views
Storage loader Failure with an unwanted json schema AWS batch pipeline (Legacy)	2	1234	September 29, 2018
ETL RDB Loader Error AWS batch pipeline (Legacy)	4	1521	February 10, 2018
Storage Loader Failure table Storage targets	2	1582	November 24, 2016
Storage Loader successful but not loading Redshift or Postgres DB Storage targets	4	2031	March 28, 2017
Redshift RDB Loader Error (mobile_context) For engineers	5	796	May 10, 2023

Storage Loader "Incomplete JSON object found"

Related topics