Emr-etl-runner works with LZO but not with GZ

Hey @alex!

Do you have any ideas on how to split/join LZO files to fully utilize Spark paralellism, per this thread? This seems non-trivial to do with a bash script given the complexity of generating the file format.

Thanks!
Bernardo

1 Like