Hello!
I am trying to recover errors from schema_violations using snowplow-event-recovery.
Source of errors - redundant field “redundant_field.” https://jsonpathfinder.com/ says it has path “x.data.payload.raw.parameters[21].value.data.data.redundant_field” (.value is encoded in base64)
My idea is to remove that field from errors.
I’ve seen examples of recovery config in Snowplow Documentation and in git and my recovery config looks like this:
{
"schema": "iglu:com.snowplowanalytics.snowplow/recoveries/jsonschema/4-0-0",
"data": {
"iglu:com.snowplowanalytics.snowplow.badrows/schema_violations/jsonschema/2-0-*": [
{
"name": "my-events-remove-redundant-field,
"conditions": [
{
"op": "Test",
"path": "$.payload.raw.parameters.ue_px.data.schema",
"value": {
"value": "iglu:com.mycompany/my_events/jsonschema/1-0-0"
}
}
],
"steps": [
{
"op": "Remove",
"path": "$.raw.parameters.[?(@.name=~ue_px)].value.data.data.redundant_field"
}
]
}
]
}
}
I prepared bad events in a text file and ran the command in the cli:
./snowplow-event-recovery run --config $PWD/recovery_config.json -o $PWD/output/ -i $PWD/input/ --resolver $PWD/iglu_resolver.json
And I got:
Total Lines: 69, Recovered: 0
OK! Successfully ran recovery.
Everything is in the output/bad.txt; the output/good.txt is empty.
Maybe it’s impossible to remove a redundant field from the encoded data part? Or maybe am I missing something?