How to enable slurm services

[2018-01-03T17:19:58.779] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:19:58.779] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
[2018-01-03T17:20:03.780] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:20:03.780] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
[2018-01-03T17:20:08.780] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:20:08.780] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
[2018-01-03T17:20:13.781] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:20:13.781] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
[2018-01-03T17:22:35.924] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:22:35.924] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
[2018-01-03T17:22:40.925] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)

This is only a snippet of the error, but the errors are all the same.

That means that the SLURM database is not set up. You previously mentioned doing a manual installation, that could be the cause.

The install script runs the following (as root):

yum -y install mariadb-server mariadb
systemctl start mariadb
systemctl enable mariadb
mysql_secure_installation # it answers with 'y' on everything, but doesn't change the root password
mysql -u root -p -e "create database slurm_acct_db;create user slurm@localhost;set password for slurm@localhost = password('sen2agri');grant usage on *.* to slurm;grant all privileges on slurm_acct_db.* to slurm;flush privileges;" # this will ask for the root password, which is empty

But again, there are a lot of other steps involved in the SLURM (and system) setup, which is why I strongly discourage doing them manually.

Hello,

This is the latest error message after entering the last line of your code.

ERROR 1396 (HY000) at line 1: Operation CREATE USER failed for 'slurm'@'localhost'

Anyway, I also drop the slurm_acct_db database, since it said that it already existed.

Thank you!

Is SLURM working now?

Hello,

It’s not working.

12 AM

Regards.

Does slurmdbd give the same access denied errors as before when you try to restart it?

sudo cat /var/log/slurm/slurmdbd.log?

Hi,
It seems that I had the same output for the log.

    [2018-01-03T19:39:30.601] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
    [2018-01-03T19:39:30.601] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
    [2018-01-03T19:39:35.602] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
    [2018-01-03T19:39:35.602] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
    [2018-01-03T19:39:40.602] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)

I already reset the root password of mysql to a new one. Does this have something to do with the error?

Is the information in /etc/slurm/slurmdbd.conf correct? It should contain something like

StorageType=accounting_storage/mysql
#StorageHost=localhost
#StoragePort=1234
StorageUser=slurm
StoragePass=sen2agri
#StorageLoc=slurm_acct_db

I guess I have the same config as you.

29 PM

Regards,

Then maybe try this again:

mysql -u root -p

create database slurm_acct_db;
create user slurm@localhost;
set password for slurm@localhost = password('sen2agri');
grant usage on *.* to slurm;
grant all privileges on slurm_acct_db.* to slurm;
flush privileges;
\q

If you get an error saying that the database or user already exist, you can skip to the next line.

Hello,

I encountered this problem on the second line (create user slurm@localhost;):

ERROR 1396 (HY000): Operation CREATE USER failed for ‘slurm’@‘localhost’

Regards,

Hi again,

Actually I tried to uninstall and install the whole Sen2Agri system and I encounter this slurm problem at least two times already.

ERROR 1396 (HY000): Operation CREATE USER failed for ‘slurm’@‘localhost’

It soulds like the user already exists. Try skipping that command and running the next ones.

Hello,

Here are the results of the commands:

Screenshot at 2018-01-23 12-05-04

Regards.

So slurmdbd is now running, but slurmctld is not. Can you check /var/log/slurm/slurm.log?

Hello,

Here is the result:

[2018-01-23T11:57:25.355] layouts: no layout to initialize
[2018-01-23T11:57:25.448] layouts: loading entities/relations information
[2018-01-23T11:57:25.449] error: Could not open node state file /var/spool/slurm/node_state: No such file or directory
[2018-01-23T11:57:25.449] error: NOTE: Trying backup state save file. Information may be lost!
[2018-01-23T11:57:25.449] No node state file (/var/spool/slurm/node_state.old) to recover
[2018-01-23T11:57:25.449] error: Incomplete node data checkpoint file
[2018-01-23T11:57:25.449] Recovered state of 0 nodes
[2018-01-23T11:57:25.449] error: Could not open job state file /var/spool/slurm/job_state: No such file or directory
[2018-01-23T11:57:25.449] error: NOTE: Trying backup state save file. Jobs may be lost!
[2018-01-23T11:57:25.449] No job state file (/var/spool/slurm/job_state.old) to recover
[2018-01-23T11:57:25.456] cons_res: select_p_node_init
[2018-01-23T11:57:25.456] cons_res: preparing for 2 partitions
[2018-01-23T11:57:25.456] error: Could not open reservation state file /var/spool/slurm/resv_state: No such file or directory
[2018-01-23T11:57:25.456] error: NOTE: Trying backup state save file. Reservations may be lost
[2018-01-23T11:57:25.456] No reservation state file (/var/spool/slurm/resv_state.old) to recover
[2018-01-23T11:57:25.456] Recovered state of 0 reservations
[2018-01-23T11:57:25.456] error: Could not open trigger state file /var/spool/slurm/trigger_state: No such file or directory
[2018-01-23T11:57:25.456] error: NOTE: Trying backup state save file. Triggers may be lost!
[2018-01-23T11:57:25.456] No trigger state file (/var/spool/slurm/trigger_state.old) to recover
[2018-01-23T11:57:25.456] error: Incomplete trigger data checkpoint file
[2018-01-23T11:57:25.458] read_slurm_conf: backup_controller not specified.
[2018-01-23T11:57:25.458] Reinitializing job accounting state
[2018-01-23T11:57:25.458] Ending any jobs in accounting that were running when controller went down on
[2018-01-23T11:57:25.458] cons_res: select_p_reconfigure
[2018-01-23T11:57:25.458] cons_res: select_p_node_init
[2018-01-23T11:57:25.458] cons_res: preparing for 2 partitions
[2018-01-23T11:57:25.458] Running as primary controller
[2018-01-23T11:57:25.458] Registering slurmctld at port 6817 with slurmdbd.
[2018-01-23T11:57:25.579] error: slurmdbd: Issue with call DBD_REGISTER_CTLD(1434): 4294967295(This cluster hasn't been added to accounting yet)
[2018-01-23T11:57:25.579] fatal: You need to add this cluster to accounting if you want to enforce associations, or no jobs will ever run.

Regards.

You might be able to fix that by running sudo sacctmgr add cluster sen2agri, but again, the installer should have done so. It’s really strange.

Hello,

Actually, I was requested by my colleague to uninstall and reinstall sen2agri (for evaluation purposes) and it seemed that the slurm services are now running:

Screenshot at 2018-01-25 15-39-50

The same with mariadb:

Screenshot at 2018-01-25 15-54-47

But some error still occurs when I ran sudo cat /var/log/slurm/slurmdbd.log:

Screenshot at 2018-01-25 15-56-37

Regards.

Hi,

Maybe a restart will help, but – assuming that the configuration files are there and the install script was run – I don’t understand what could cause this error.

Hi,

unfortunately I have to reopen this issue.

On sen2agri-2.0.3 this error occurs and we were not able to successfully run slurm from the install script.
The error message is always:
slurmdbd: Issue with call DBD_REGISTER_CTLD(1434): 4294967295(This cluster hasn’t been added to accounting yet)

On out test machine I manually reinitialized slurm services, cluster and qos registration and it somehow works now. I have no clue why. Any ideas?