We are running the GCSLoader component as dataflow jobs in our production account. Currently, we are using GCSLoader version 0.3.2 which uses Apache Beam version 2.34.0
We have received an email from GCP team stating that there is some bug in the Apache Beam Versions 2.32.0 - 2.37.0 (inclusive) and bug relating to File-based sources with GCS or gcsfilesystem or gcsio source.
They recommend upgrading your version of Beam to 2.38.0. Template jobs are not affected.
So, can you please let me know if there is any plan to release a new version of gcsloader which uses Apache version 2.38.0 or above.
Hi @srashti_vishwakarma I opened this issue in Github to upgrade GCS Loader to use Beam 2.38.0 and there is already a PR open to address it. We will announce here on Discourse once the new version is released, and I expect it will be quite soon.
Thank you for your response. I just want to understand the impact of this bug in GCP on snowplow gcsloader component.
Basically I would like to know if we wait for the new gcsloader version to be release by you and then upgrade our Dataflow jobs in the meantime what impact it can cause ?
As of now our gcsloaders are working fine.
Hi @srashti_vishwakarma, as announced over in this thread, we have just released GCS loader version 0.5.0 which uses Apache Beam version 2.38.0. I hope this solves the problem for you.