As SLURM is a manager for computer clusters, we are wandering if it is possible to extend the pool to other servers possessing the executables ? It would not be especially CentOS servers, to allow the use on existing clusters.
Some prerequisites come to mind:
- SLURM: add the pool to the slurm config file (obviously)
- OTB apps: compile and install on other nodes
- python scripts: copy to make avail on other nodes
- custom paths: as clusters are often used by a variety of applications and users, it is likely than the libraries are not installed in classical paths to allow parallel and different builds. It is possible to add a “source” script at the beginning of the sbatch script as an option ? That allows access to python files (with PYTHONPATH) and the libs (LD_LIBRARY_PATH), and the apps (OTB_APPLICATION_PATH)
- orchestrator: is there something to do ? I do not understand where the triggers to add right or wrong status to the DB are implemented. Is it file-based, is there something in the sbatch script sending a message, or something else ?
- other prerequisite ?
The system does use SLURM to run the processing jobs, so in theory it’s possible to set it up on multiple servers. Point by point:
SLURM: add the pool to the slurm config file (obviously)
Yes, the SLURM cluster should be configured properly.
OTB apps: compile and install on other nodes
Yes, or you can install the
sen2agri-processors RPM if you’re using our binary distribution.
python scripts: copy to make avail on other nodes
The relevant ones should be in that package. The downloaders should only run on a single node.
custom paths: as clusters are often used by a variety of applications and users
Not right now, the applications assume a system-wide install.
orchestrator: is there something to do ?
There should be a single instance of the orchestrator and executor daemons. The jobs try to report their completion to the executor, using the IP address configured in the database (
executor.listen-ip key in the
config table). This defaults to
127.0.0.1 and should be changed to that node’s address.
other prerequisite ?
Not as far as I know, but please note that this scenario hasn’t really been tested properly. Sorry for the bad news.
My preference would be to provide a couple of Docker images to simplify installation on shared nodes, but this has yet to happen.