SLURM issues after update to v2.0

Hi,

after updating our machine from v1.7 to 2.0 (with all patches applied), we’re having trouble executing L4 jobs in automatic mode.

The job gets submitted, but never starts running. I’ve seen a similar topic on this here for a previous version, but that didn’t help.

SLURM looks fine:

● slurmdbd.service - Slurm DBD accounting daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmdbd.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2019-06-05 13:16:59 CEST; 1 weeks 6 days ago
 Main PID: 9460 (slurmdbd)
   CGroup: /system.slice/slurmdbd.service
           └─9460 /usr/sbin/slurmdbd

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2019-06-05 13:16:53 CEST; 1 weeks 6 days ago
 Main PID: 8693 (slurmctld)
   CGroup: /system.slice/slurmctld.service
           └─8693 /usr/sbin/slurmctld

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

● slurmd.service - Slurm node daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/slurmd.service.d
           └─override.conf
   Active: active (running) since Tue 2019-06-18 13:56:26 CEST; 5min ago
  Process: 1167 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 1171 (slurmd)
   CGroup: /system.slice/slurmd.service
           └─1171 /usr/sbin/slurmd

Jun 18 13:56:26 wfp.sen2agri systemd[1]: Starting Slurm node daemon...
Jun 18 13:56:26 wfp.sen2agri systemd[1]: PID file /var/run/slurmd.pid not readable (yet?) after start.
Jun 18 13:56:26 wfp.sen2agri systemd[1]: Started Slurm node daemon.

And the orchestrator as well (tried to restart it even):

...
Jun 18 12:39:09 wfp.sen2agri sen2agri-orchestrator[6767]: Using L2A tile: /mnt/archive/maccs_def/syr_homs/l2a/S2B_MSIL2A_20190426T081609_N0207_R121_T37SCU_20190426T104931.SAFE/SENTINEL2B_20190426-083031-560_L2A_T37SCU_C_V1-0/SENTINEL2B_20
Jun 18 12:39:09 wfp.sen2agri sen2agri-orchestrator[6767]: Processing task runnable event with processor id 5 task id 2 and job id 2
Jun 18 13:41:42 wfp.sen2agri systemd[1]: Stopping Orchestrator for Sen2Agri...
Jun 18 13:41:42 wfp.sen2agri systemd[1]: Stopped Orchestrator for Sen2Agri.
Jun 18 13:41:46 wfp.sen2agri systemd[1]: Starting Orchestrator for Sen2Agri...
Jun 18 13:41:46 wfp.sen2agri sen2agri-orchestrator[32194]: Reading settings from /etc/sen2agri/sen2agri.conf
Jun 18 13:41:46 wfp.sen2agri systemd[1]: Started Orchestrator for Sen2Agri.
Jun 18 13:42:46 wfp.sen2agri sen2agri-orchestrator[32194]: Processing job paused event with job id 1
Jun 18 13:45:46 wfp.sen2agri sen2agri-orchestrator[32194]: Processing job resumed event with job id 1
Jun 18 13:45:46 wfp.sen2agri sen2agri-orchestrator[32194]: Processing task runnable event with processor id 5 task id 1 and job id 1
Jun 18 13:56:56 wfp.sen2agri sen2agri-orchestrator[32194]: Processing job paused event with job id 1
Jun 18 13:57:26 wfp.sen2agri sen2agri-orchestrator[32194]: Processing job resumed event with job id 1
Jun 18 13:57:26 wfp.sen2agri sen2agri-orchestrator[32194]: Processing task runnable event with processor id 5 task id 1 and job id 1
Jun 18 13:57:58 wfp.sen2agri systemd[1]: Stopping Orchestrator for Sen2Agri...
Jun 18 13:57:58 wfp.sen2agri systemd[1]: Stopped Orchestrator for Sen2Agri.
Jun 18 13:57:58 wfp.sen2agri systemd[1]: Starting Orchestrator for Sen2Agri...
Jun 18 13:57:58 wfp.sen2agri sen2agri-orchestrator[1374]: Reading settings from /etc/sen2agri/sen2agri.conf
Jun 18 13:57:58 wfp.sen2agri systemd[1]: Started Orchestrator for Sen2Agri.

I tried running sudo scontrol reconfigure and restart slurmd. Still nothing works.

Here’s the output of var/log/slurm/slurm.log with an apparent issue:

[2019-06-18T12:38:59.477] _job_create: invalid account or partition for user 51600, account 'sen2agri-service', and partition 'sen2agri'
[2019-06-18T12:38:59.525] _slurm_rpc_submit_batch_job: Invalid account or account/partition combination specified
[2019-06-18T12:39:09.868] _job_create: invalid account or partition for user 51600, account 'sen2agri-service', and partition 'sen2agri'
[2019-06-18T12:39:09.868] _slurm_rpc_submit_batch_job: Invalid account or account/partition combination specified
[2019-06-18T13:45:46.631] _job_create: invalid account or partition for user 51600, account 'sen2agri-service', and partition 'sen2agri'
[2019-06-18T13:45:46.632] _slurm_rpc_submit_batch_job: Invalid account or account/partition combination specified
[2019-06-18T13:52:50.118] error: User 1000 not found
[2019-06-18T13:52:50.118] _job_create: invalid account or partition for user 1000, account '(null)', and partition 'sen2agri'
[2019-06-18T13:52:50.118] _slurm_rpc_allocate_resources: Invalid account or account/partition combination specified 
[2019-06-18T13:56:11.799] Processing RPC: REQUEST_RECONFIGURE from uid=1000
[2019-06-18T13:56:11.799] error: Security violation, RECONFIGURE RPC from uid=1000
[2019-06-18T13:56:11.799] error: _slurm_rpc_reconfigure_controller: Invalid user id
[2019-06-18T13:56:15.753] Processing RPC: REQUEST_RECONFIGURE from uid=0
[2019-06-18T13:56:15.883] restoring original state of nodes
[2019-06-18T13:56:15.883] cons_res: select_p_node_init
[2019-06-18T13:56:15.884] cons_res: preparing for 2 partitions
[2019-06-18T13:56:15.908] read_slurm_conf: backup_controller not specified.
[2019-06-18T13:56:15.908] cons_res: select_p_reconfigure
[2019-06-18T13:56:15.909] cons_res: select_p_node_init
[2019-06-18T13:56:15.909] cons_res: preparing for 2 partitions
[2019-06-18T13:56:15.939] _slurm_rpc_reconfigure_controller: completed usec=185865
[2019-06-18T13:56:16.955] SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=0
[2019-06-18T13:57:26.648] _job_create: invalid account or partition for user 51600, account 'sen2agri-service', and partition 'sen2agri'
[2019-06-18T13:57:26.649] _slurm_rpc_submit_batch_job: Invalid account or account/partition combination specified

Thanks,

Val