How to enable slurm services

lnicola · January 23, 2018, 9:22am

So slurmdbd is now running, but slurmctld is not. Can you check /var/log/slurm/slurm.log?

brentf · January 24, 2018, 3:39am

Hello,

Here is the result:

[2018-01-23T11:57:25.355] layouts: no layout to initialize
[2018-01-23T11:57:25.448] layouts: loading entities/relations information
[2018-01-23T11:57:25.449] error: Could not open node state file /var/spool/slurm/node_state: No such file or directory
[2018-01-23T11:57:25.449] error: NOTE: Trying backup state save file. Information may be lost!
[2018-01-23T11:57:25.449] No node state file (/var/spool/slurm/node_state.old) to recover
[2018-01-23T11:57:25.449] error: Incomplete node data checkpoint file
[2018-01-23T11:57:25.449] Recovered state of 0 nodes
[2018-01-23T11:57:25.449] error: Could not open job state file /var/spool/slurm/job_state: No such file or directory
[2018-01-23T11:57:25.449] error: NOTE: Trying backup state save file. Jobs may be lost!
[2018-01-23T11:57:25.449] No job state file (/var/spool/slurm/job_state.old) to recover
[2018-01-23T11:57:25.456] cons_res: select_p_node_init
[2018-01-23T11:57:25.456] cons_res: preparing for 2 partitions
[2018-01-23T11:57:25.456] error: Could not open reservation state file /var/spool/slurm/resv_state: No such file or directory
[2018-01-23T11:57:25.456] error: NOTE: Trying backup state save file. Reservations may be lost
[2018-01-23T11:57:25.456] No reservation state file (/var/spool/slurm/resv_state.old) to recover
[2018-01-23T11:57:25.456] Recovered state of 0 reservations
[2018-01-23T11:57:25.456] error: Could not open trigger state file /var/spool/slurm/trigger_state: No such file or directory
[2018-01-23T11:57:25.456] error: NOTE: Trying backup state save file. Triggers may be lost!
[2018-01-23T11:57:25.456] No trigger state file (/var/spool/slurm/trigger_state.old) to recover
[2018-01-23T11:57:25.456] error: Incomplete trigger data checkpoint file
[2018-01-23T11:57:25.458] read_slurm_conf: backup_controller not specified.
[2018-01-23T11:57:25.458] Reinitializing job accounting state
[2018-01-23T11:57:25.458] Ending any jobs in accounting that were running when controller went down on
[2018-01-23T11:57:25.458] cons_res: select_p_reconfigure
[2018-01-23T11:57:25.458] cons_res: select_p_node_init
[2018-01-23T11:57:25.458] cons_res: preparing for 2 partitions
[2018-01-23T11:57:25.458] Running as primary controller
[2018-01-23T11:57:25.458] Registering slurmctld at port 6817 with slurmdbd.
[2018-01-23T11:57:25.579] error: slurmdbd: Issue with call DBD_REGISTER_CTLD(1434): 4294967295(This cluster hasn't been added to accounting yet)
[2018-01-23T11:57:25.579] fatal: You need to add this cluster to accounting if you want to enforce associations, or no jobs will ever run.

Regards.

lnicola · January 24, 2018, 9:25am

You might be able to fix that by running sudo sacctmgr add cluster sen2agri, but again, the installer should have done so. It’s really strange.

brentf · January 25, 2018, 8:04am

Hello,

Actually, I was requested by my colleague to uninstall and reinstall sen2agri (for evaluation purposes) and it seemed that the slurm services are now running:

Screenshot at 2018-01-25 15-39-50

The same with mariadb:

Screenshot at 2018-01-25 15-54-47

But some error still occurs when I ran sudo cat /var/log/slurm/slurmdbd.log:

Screenshot at 2018-01-25 15-56-37

Regards.

lnicola · January 29, 2018, 3:15pm

Hi,

Maybe a restart will help, but – assuming that the configuration files are there and the install script was run – I don’t understand what could cause this error.

Innocturion · April 19, 2021, 6:01pm

Hi,

unfortunately I have to reopen this issue.

On sen2agri-2.0.3 this error occurs and we were not able to successfully run slurm from the install script.
The error message is always:
slurmdbd: Issue with call DBD_REGISTER_CTLD(1434): 4294967295(This cluster hasn’t been added to accounting yet)

On out test machine I manually reinitialized slurm services, cluster and qos registration and it somehow works now. I have no clue why. Any ideas?