Emr-etl-runner works with LZO but not with GZ

bernardosrulzon · September 5, 2017, 2:20pm

Do you have any ideas on how to split/join LZO files to fully utilize Spark paralellism, per this thread? This seems non-trivial to do with a bash script given the complexity of generating the file format.

Thanks!
Bernardo

Topic		Replies	Views
Raw events in gzip AWS batch pipeline (Legacy)	1	1848	October 9, 2018
Migration from batch processing to (near) real-time For engineers	3	966	February 14, 2019
Problem at S3 to HDFS S3DistCp step AWS batch pipeline (Legacy)	19	7280	June 4, 2021
Having trouble with the EMR loader consuming in stream mode For engineers	2	940	September 28, 2018
EMR ETL stream_enrich mode Enrichment	14	3088	September 21, 2019

Emr-etl-runner works with LZO but not with GZ

Related topics