Currently Factotum is a single-node jobflow runner - there is no built-in support for running multiple Factotum workers in a distributed fashion (unlike Chronos or similar).
However, there is a strategy you can use, based on an idea in Kyle Kingsbury’s Jepsen analysis of Chronos:
you might consider shipping cronfiles directly to redundant nodes and
having tasks coordinate through a consensus system–it could, depending
on your infrastructure reliability and need for load-balancing, be
simpler and more reliable
Provided that your jobs:
- Can detect if another instance of the same job has started
- Will exit gracefully (providing a distinct no-op code) if 1. is true
then you can potentially push the same cron file containing Factotum commands to multiple servers for execution.
Some provisos:
- There could still be race conditions if two jobs start at the exact same time
- If one of your servers dies during a run, obviously the jobs running at the time will not complete and will not be re-scheduled, so this isn’t a true high-availability solution
We’ll update this article as/when we have built in distribution in Factotum…