Rdb-transformer can not write to output file

Dmitry_Medkov · May 24, 2023, 3:51pm

An error occurred while processing EMR cluster data (RDB Transformer). Namely:
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2684.1 in stage 4.0 (TID 7538) can not write to output file: org.apache.hadoop.fs.FileAlreadyExistsException: File already exists:s3://bucket-name/directory_name/run=2023-05-24-13-01-43/output=good/vendor=com.snowplowanalytics.mobile/name=application/format=tsv/model=1/part-02684-62c861b9-eda4-4c36-b3d9-0d3b9d28870e.c000.txt.gz

I am using RDB-transformer version: s3://snowplow-hosted-assets/4-storage/transformer-batch/snowplow-transformer-batch-4.2.0.jar

I got this error 1 time before. For several months of operation of Snowplow Streaming.

How do I get rid of the error?
I delete the unsuccessful transformed directory and run the EMR cluster on the same data. After that, the error is not reproduced.

With what the error can be connected? Is it possible to make the error no longer reproduce?

Topic		Replies	Views
Rdb-transformer fails when writing to file Troubleshooting	0	610	August 16, 2023
Transformer batch 4.2.1 throws InvalidInputException Troubleshooting	1	747	August 18, 2022
Snowflake transformer fails in EMR step Troubleshooting	3	1817	December 10, 2020
RDB Transformer 5.7.4 and Java 11 on emr 6.2.0 Troubleshooting	0	96	June 12, 2024
Snowflake Transformer fails Storage targets	3	966	March 4, 2021

Rdb-transformer can not write to output file

Related topics