{
"line": "<base64 encoded string>",
"errors": [
{
"level": "error",
"message": "error: object instance has properties which are not allowed by the schema: [\"submitted\"]\n level: \"error\"\n schema: {\"loadingURI\":\"#\",\"pointer\":\"\"}\n instance: {\"pointer\":\"\"}\n domain: \"validation\"\n keyword: \"additionalProperties\"\n unwanted: [\"submitted\"]\n"
}
],
"failure_tstamp": "2021-03-21T10:03:26.280Z"
}
We are trying to use snowplow-event-recovery-spark-0.1.0.jar to correct the bad row. We are just unsure as to what to give as the error filter in the config. Specifically, what should go inside the ‘error’ property in the configuration. Should we just copy the message field as follows?
{
"schema": "iglu:com.snowplowanalytics.snowplow/recoveries/jsonschema/1-0-0",
"data": [
{
"name": "RemoveFromBody",
"error": "error: object instance has properties which are not allowed by the schema: [\"submitted\"]\n level: \"error\"\n schema: {\"loadingURI\":\"#\",\"pointer\":\"\"}\n instance: {\"pointer\":\"\"}\n domain: \"validation\"\n keyword: \"additionalProperties\"\n unwanted: [\"submitted\"]\n",
"toRemove": "\"submitted\":\".*\",?"
}
]
}
@onnu_thonala_ad , yes, you can either use the whole string with the exact characters as they are in the bad data error or just part of it sufficient to identify the rejected event you are after. For example, you could use just “object instance has properties which are not allowed by the schema: [“submitted”]”.
# Removes a field which shouldn't be there
{
"name": "RemoveFromBody",
"error": "object instance has properties which are not allowed by the schema: [\"test\"]",
"toRemove": "\"test\":\".*\",?"
}
Hello @ihor, I tried the regex that you had given but it didn’t work. I tried testing on my local but it was throwing errors. Basically, the parsing of the recoveryScenarios JSON using the circe parser fails for nested JSONs.
val recoveryScenarios = io.circe.parser.parse(getResourceContent("/recovery_scenarios.json"))
.flatMap(_.hcursor.get[List[RecoveryScenario]]("data"))
.fold(f => throw new Exception(s"invalid recovery scenarios: ${f.getMessage}"), identity)
I tried 3 different regex’s, but none of them worked. Attaching the screenshots for your reference -
@onnu_thonala_ad , you do not need to share any sensitive data - only a single example of the bad row with the sensitive data masked. I’m only interested in the structure of your bad event.
I’m afraid the support you are suggesting is beyond what I can do for OS users.
Hey @onnu_thonala_ad , this is already formatted and only custom data. I meant to see the whole bad row including the error message. Could you decode the encoded values, remove sensitive data and encode it back, and present the whole bad row?
I’m also amending the title of this post as it is version 0.1.0 (old bad format), not 1.0.0 (new format) you are to use to recover.