Data loss on RDBLoader at relaunch on Fargate?

Hi all,

We are using R35 RDBLoader and we have the following questions:

  1. In our CICD we update the RDBLoader image on every deployment which enforces a new task definition on ECS Fargate. By that the current Fargate task will be stopped and a new one would launch. Now the question is whether killing the old RDBLoader task leads to a data loss on Redshift, if there are still unprocessed sqs messages on the queue?

  2. We are already facing the situation that some runs on the shredded-bucket are not loaded to Redshift and we are considering writing a script that scans the shredded-bucket to find the runs which are not loaded yet and put their shredding_complete.json on sqs. Is there any alternative solution to this approach that you may suggest?

1 Like

Hi @dadasami,

  1. No, messages remain in SQS and next launch will handle them properly. Even more - if your CI/CD kills the app in the middle of the loading - it’s still safe to assume that the same message will get back to the loader again, because it haven’t been ack’ed. And ack’ing happens only after transaction has been commited. TL;DR - it’s completely safe.
  2. Yeah, we also facing the same problem - we carefully monitoring CloudWatch logs for any exceptions, but it’s still possible to miss it due a human mistake. Currently we’re working on 1.2.0 version which introduces this monitoring embedded into the Loader - it will be listing the archive on schedule and notify about any corrupted (the folder is in archive, but no shredding_complete.json file) and unloaded (the folder is in archive, but not in manifest) folders.