How to enable slurm services

So slurmdbd is now running, but slurmctld is not. Can you check /var/log/slurm/slurm.log?

Hello,

Here is the result:

[2018-01-23T11:57:25.355] layouts: no layout to initialize
[2018-01-23T11:57:25.448] layouts: loading entities/relations information
[2018-01-23T11:57:25.449] error: Could not open node state file /var/spool/slurm/node_state: No such file or directory
[2018-01-23T11:57:25.449] error: NOTE: Trying backup state save file. Information may be lost!
[2018-01-23T11:57:25.449] No node state file (/var/spool/slurm/node_state.old) to recover
[2018-01-23T11:57:25.449] error: Incomplete node data checkpoint file
[2018-01-23T11:57:25.449] Recovered state of 0 nodes
[2018-01-23T11:57:25.449] error: Could not open job state file /var/spool/slurm/job_state: No such file or directory
[2018-01-23T11:57:25.449] error: NOTE: Trying backup state save file. Jobs may be lost!
[2018-01-23T11:57:25.449] No job state file (/var/spool/slurm/job_state.old) to recover
[2018-01-23T11:57:25.456] cons_res: select_p_node_init
[2018-01-23T11:57:25.456] cons_res: preparing for 2 partitions
[2018-01-23T11:57:25.456] error: Could not open reservation state file /var/spool/slurm/resv_state: No such file or directory
[2018-01-23T11:57:25.456] error: NOTE: Trying backup state save file. Reservations may be lost
[2018-01-23T11:57:25.456] No reservation state file (/var/spool/slurm/resv_state.old) to recover
[2018-01-23T11:57:25.456] Recovered state of 0 reservations
[2018-01-23T11:57:25.456] error: Could not open trigger state file /var/spool/slurm/trigger_state: No such file or directory
[2018-01-23T11:57:25.456] error: NOTE: Trying backup state save file. Triggers may be lost!
[2018-01-23T11:57:25.456] No trigger state file (/var/spool/slurm/trigger_state.old) to recover
[2018-01-23T11:57:25.456] error: Incomplete trigger data checkpoint file
[2018-01-23T11:57:25.458] read_slurm_conf: backup_controller not specified.
[2018-01-23T11:57:25.458] Reinitializing job accounting state
[2018-01-23T11:57:25.458] Ending any jobs in accounting that were running when controller went down on
[2018-01-23T11:57:25.458] cons_res: select_p_reconfigure
[2018-01-23T11:57:25.458] cons_res: select_p_node_init
[2018-01-23T11:57:25.458] cons_res: preparing for 2 partitions
[2018-01-23T11:57:25.458] Running as primary controller
[2018-01-23T11:57:25.458] Registering slurmctld at port 6817 with slurmdbd.
[2018-01-23T11:57:25.579] error: slurmdbd: Issue with call DBD_REGISTER_CTLD(1434): 4294967295(This cluster hasn't been added to accounting yet)
[2018-01-23T11:57:25.579] fatal: You need to add this cluster to accounting if you want to enforce associations, or no jobs will ever run.

Regards.

You might be able to fix that by running sudo sacctmgr add cluster sen2agri, but again, the installer should have done so. It’s really strange.

Hello,

Actually, I was requested by my colleague to uninstall and reinstall sen2agri (for evaluation purposes) and it seemed that the slurm services are now running:

Screenshot at 2018-01-25 15-39-50

The same with mariadb:

Screenshot at 2018-01-25 15-54-47

But some error still occurs when I ran sudo cat /var/log/slurm/slurmdbd.log:

Screenshot at 2018-01-25 15-56-37

Regards.

Hi,

Maybe a restart will help, but – assuming that the configuration files are there and the install script was run – I don’t understand what could cause this error.

Hi,

unfortunately I have to reopen this issue.

On sen2agri-2.0.3 this error occurs and we were not able to successfully run slurm from the install script.
The error message is always:
slurmdbd: Issue with call DBD_REGISTER_CTLD(1434): 4294967295(This cluster hasn’t been added to accounting yet)

On out test machine I manually reinitialized slurm services, cluster and qos registration and it somehow works now. I have no clue why. Any ideas?