I have been using snowplow scala SDK and Python SDK to convert TSV enriched events into JSON.
Now the issue is that the created JSON is not getting converted back to the Snowplow Event object itself.
In the case of Python SDK -
It basically transforms the TSV event into JSON (check this - Python Analytics SDK Event Transformer · snowplow/snowplow Wiki · GitHub) with flattening the self-describing Json and then this cannot be converted back to snowplow event using the implicit encoder and decoder from scala SDK.
and I didn’t found any method for the same in the python SDK.
Now in case of using Scala-SDK -
The generated JSON is able to convert back to the object if we are not parsing it while flattening the JSON’s.
sample code that I tried -
val snowEvent: Event = Event.parse(page_view_tsv_format).getOrElse(throw new RuntimeException("not able to parse"))
println("got the event obejct" + snowEvent)
val snowJsonFormat = snowEvent.toJson(lossy = false)
println("converted successfully it into json " + snowJsonFormat)
**// now converting it back to the Event Object, This will work**
val jsonToEvent = parser.decode[Event](snowJsonFormat.toString)
But if in the above example itself if we convert it with
val snowJsonFormat = snowEvent.toJson(lossy = true) and then try to convert it back -
here it gives error -
Left(DecodingFailure(Attempt to decode value on failed cursor, List(DownField(contexts))))
I’m not sure we ever instrumented a method to transform the data back from a JSON to a TSV event, but perhaps I’m wrong - is there some example somewhere in the documentation and/or code that led you to the suggestion that this was possible? If so can you point me to it so I can take a look?
In either case, my instinct is to ask why you would like to do this? It’s not a use case I’ve seen anyone need before, and so if I wonder whether your goal can be achieved elsewhere.
(For context, the Analytics SDK’s are typically used when consuming data from the enriched stream, for example in building a real-time application. Typically once it’s in JSON format it doesn’t need to go back to TSV in that world - TSV is usually consumed by the loaders.)
Fair point - I didn’t mean to suggest that there isn’t a possible use case, just that I haven’t come across one - so I wonder if we can skin this particular cat some other way.
Your use case makes sense - but to play devil’s advocate, I have a gut feeling that the ideal way to support this kind of behaviour in the longer term is to make the enrichment component of the pipeline itself more extensible and flexible?
I’m thinking of a ‘plugin’ style enrichment, where one would write their own enrichment module and drop it into the pipeline. Perhaps this doesn’t play well with some of our design principles.
I guess it just feels a bit complicated to instrument what you’ve outlined, because all of the loaders now depend on schemas - so if you’re transforming the events in-between there’s potential complication there.