Hi @Jan-Eric_Duden , that is a pretty accurate picture. The tsv in step 3 is a mix of plain-vanilla tsv and JSON (where some of the values are JSON blobs).
We are constantly thinking about how to improve the formats but we don’t currently have any short-term plans for avro support. Different formats have different strengths, depending on the use case, so even though streamlining and simplifying the pipeline is something we are very keen to do, it’s not necessarily the main driving force for these decisions. Eg, adding support for Parquet in step 3 would further diversify the formats but would also unlock use cases for data in data lakes and such.