Encoded bad rows in Elasticsearch - advanced debugging support

ihor · November 9, 2017, 12:08am

Indeed, the bad events generally take form

{ 
   line: "original raw event as a string record", 
   "errors":
      [{
         "level":"error",
         "message":"Here what's wrong with this event"
      }],
   "failure_tstamp":"timestamp"
}

This is true regardless of the collector you use though it would be base64 encoded in your case.

Unfortunately, we currently do not provide an ability to present that data decoded.

Here’s the tutorial describing how to debug the bad data in Elasticsearch using curl: Debugging bad rows in Elasticsearch using curl (without Kibana) [tutorial]. The approach is to filter out the events we do not care about (generated by bots, resulted due to OPTIONS requests, etc). The remaining would need to be examined to determine the reason for failure. It means decoding the value in line parameter (and fixing the underlying reason).

With regard to recovering data, if you are not using Lambda batch in your architechture - that is the “bad” events are not saved in S3 then I’m afraid you won’t be able to recover them. Having the events in S3 allows you to get them recovered and fed back to batch pipeline with the help of Hadoop Event Recovery. I think it’s reasonable taken the nature of “real-time” vs “batch” pipelines.

Topic		Replies	Views
Decoding real-time bad records (Thrift) [tutorial] Troubleshooting	0	3680	December 12, 2016
Streaming bad events are not queryable Enrichment	6	2443	October 18, 2018
Process bad rows from Elasticsearch and form them into good rows Troubleshooting	5	3229	May 16, 2017
Debugging bad rows in Spark and Zeppelin [tutorial] For data modelers & consumers	1	13939	August 10, 2016
Writing thrift from enriched bad rows Enrichment	10	2314	June 21, 2018

Encoded bad rows in Elasticsearch - advanced debugging support

Related topics