And slurmd
is still running? What do scontrol show node
and sinfo
show?
slurmd status:
● slurmd.service - Slurm node daemon
Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/slurmd.service.d
└─override.conf
Active: active (running) since Tue 2018-03-13 09:49:24 UTC; 2h 5min ago
Process: 889 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 974 (slurmd)
CGroup: /system.slice/slurmd.service
└─974 /usr/sbin/slurmd
Mar 13 09:49:24 benchmark-32-4sites.novalocal systemd[1]: Starting Slurm node daem...
Mar 13 09:49:24 benchmark-32-4sites.novalocal systemd[1]: PID file /var/run/slurmd...
Mar 13 09:49:24 benchmark-32-4sites.novalocal systemd[1]: Started Slurm node daemon.
scontrol show:
NodeName=localhost Arch=x86_64 CoresPerSocket=1
CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=0.03 Features=(null)
Gres=(null)
NodeAddr=localhost NodeHostName=localhost Version=15.08
OS=Linux RealMemory=1 AllocMem=0 FreeMem=7091 Sockets=8 Boards=1
State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A
BootTime=2018-03-13T09:49:10 SlurmdStartTime=2018-03-13T09:49:24
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Reason=Low socket*core*thread count, Low CPUs [slurm@2018-03-05T12:44:48]
sinfo:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
sen2agri* up infinite 1 drain localhost
sen2agriHi up infinite 1 drain localhost
I believe you need to run scontrol reconfigure
and maybe restart slurmd
(systemctl restart slurmd
) after editing /etc/slurm/slurm.conf
.
sinfo -R
will give you the reason your nodes are in the state drain
and you can resume them with scontrol update NodeName=localhost state=resume