Schema_violations errors recovery

Hello!

I am trying to recover errors from schema_violations using snowplow-event-recovery.

Source of errors - redundant field “redundant_field.” https://jsonpathfinder.com/ says it has path “x.data.payload.raw.parameters[21].value.data.data.redundant_field” (.value is encoded in base64)
My idea is to remove that field from errors.

I’ve seen examples of recovery config in Snowplow Documentation and in git and my recovery config looks like this:

{
  "schema": "iglu:com.snowplowanalytics.snowplow/recoveries/jsonschema/4-0-0",
  "data": {
    "iglu:com.snowplowanalytics.snowplow.badrows/schema_violations/jsonschema/2-0-*": [
      {
        "name": "my-events-remove-redundant-field,
        "conditions": [
          {
            "op": "Test",
            "path": "$.payload.raw.parameters.ue_px.data.schema",
            "value": {
              "value": "iglu:com.mycompany/my_events/jsonschema/1-0-0"
            }
          }
        ],
        "steps": [
          {
            "op": "Remove",
            "path": "$.raw.parameters.[?(@.name=~ue_px)].value.data.data.redundant_field"
          }
        ]
      }
    ]
  }
}

I prepared bad events in a text file and ran the command in the cli:

./snowplow-event-recovery run --config $PWD/recovery_config.json -o $PWD/output/ -i $PWD/input/ --resolver $PWD/iglu_resolver.json

And I got:

Total Lines: 69, Recovered: 0
OK! Successfully ran recovery.

Everything is in the output/bad.txt; the output/good.txt is empty.

Maybe it’s impossible to remove a redundant field from the encoded data part? Or maybe am I missing something?

It seems the “Remove” operation works more like “nullify”: it doesn’t remove fields, it nullifies its values.

So I decided to use “Replace” on encoded data and work with it like with string, replacing field and its value:

{
  "schema": "iglu:com.snowplowanalytics.snowplow/recoveries/jsonschema/4-0-0",
  "data": {
    "iglu:com.snowplowanalytics.snowplow.badrows/schema_violations/jsonschema/2-0-*": [
      {
        "name": "my-events-remove-redundant-field,
        "conditions": [
          {
            "op": "Test",
            "path": "$.payload.raw.parameters.ue_px.data.schema",
            "value": {
              "value": "iglu:com.mycompany/my_events/jsonschema/1-0-0"
            }
          }
        ],
        "steps": [
          {
            "op": "Replace",
            "path": "$.raw.parameters.[?(@.name=~ue_px)].value.data.data",
            "match": "\"redundant_field\":\\s*[^,]+,",
            "value": ""
          }
        ]
      }
    ]
  }
}

At least in tests, it works:

Total Lines: 439, Recovered: 38
OK! Successfully ran recovery