How to enable slurm services

brentf · January 4, 2018, 3:10am

[2018-01-03T17:19:58.779] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:19:58.779] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
[2018-01-03T17:20:03.780] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:20:03.780] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
[2018-01-03T17:20:08.780] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:20:08.780] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
[2018-01-03T17:20:13.781] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:20:13.781] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
[2018-01-03T17:22:35.924] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:22:35.924] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
[2018-01-03T17:22:40.925] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)

This is only a snippet of the error, but the errors are all the same.

lnicola · January 4, 2018, 12:09pm

That means that the SLURM database is not set up. You previously mentioned doing a manual installation, that could be the cause.

The install script runs the following (as root):

yum -y install mariadb-server mariadb
systemctl start mariadb
systemctl enable mariadb
mysql_secure_installation # it answers with 'y' on everything, but doesn't change the root password
mysql -u root -p -e "create database slurm_acct_db;create user slurm@localhost;set password for slurm@localhost = password('sen2agri');grant usage on *.* to slurm;grant all privileges on slurm_acct_db.* to slurm;flush privileges;" # this will ask for the root password, which is empty

But again, there are a lot of other steps involved in the SLURM (and system) setup, which is why I strongly discourage doing them manually.

brentf · January 11, 2018, 11:14am

Hello,

This is the latest error message after entering the last line of your code.

ERROR 1396 (HY000) at line 1: Operation CREATE USER failed for 'slurm'@'localhost'

Anyway, I also drop the slurm_acct_db database, since it said that it already existed.

Thank you!

lnicola · January 11, 2018, 12:41pm

Is SLURM working now?

brentf · January 12, 2018, 2:42am

Hello,

It’s not working.

12 AM

Regards.

lnicola · January 12, 2018, 9:19am

Does slurmdbd give the same access denied errors as before when you try to restart it?

sudo cat /var/log/slurm/slurmdbd.log?

brentf · January 12, 2018, 10:03am

Hi,
It seems that I had the same output for the log.

    [2018-01-03T19:39:30.601] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
    [2018-01-03T19:39:30.601] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
    [2018-01-03T19:39:35.602] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
    [2018-01-03T19:39:35.602] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
    [2018-01-03T19:39:40.602] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)

I already reset the root password of mysql to a new one. Does this have something to do with the error?

lnicola · January 12, 2018, 10:20am

Is the information in /etc/slurm/slurmdbd.conf correct? It should contain something like

StorageType=accounting_storage/mysql
#StorageHost=localhost
#StoragePort=1234
StorageUser=slurm
StoragePass=sen2agri
#StorageLoc=slurm_acct_db

brentf · January 12, 2018, 10:30am

I guess I have the same config as you.

29 PM

Regards,

lnicola · January 12, 2018, 10:38am

Then maybe try this again:

mysql -u root -p

create database slurm_acct_db;
create user slurm@localhost;
set password for slurm@localhost = password('sen2agri');
grant usage on *.* to slurm;
grant all privileges on slurm_acct_db.* to slurm;
flush privileges;
\q

If you get an error saying that the database or user already exist, you can skip to the next line.

brentf · January 17, 2018, 10:12am

Hello,

I encountered this problem on the second line (create user slurm@localhost;):

ERROR 1396 (HY000): Operation CREATE USER failed for ‘slurm’@‘localhost’

Regards,

brentf · January 17, 2018, 10:25am

Hi again,

Actually I tried to uninstall and install the whole Sen2Agri system and I encounter this slurm problem at least two times already.

lnicola · January 19, 2018, 2:41pm

ERROR 1396 (HY000): Operation CREATE USER failed for ‘slurm’@‘localhost’

It soulds like the user already exists. Try skipping that command and running the next ones.

brentf · January 23, 2018, 4:07am

Hello,

Here are the results of the commands:

Screenshot at 2018-01-23 12-05-04

Regards.

lnicola · January 23, 2018, 9:22am

So slurmdbd is now running, but slurmctld is not. Can you check /var/log/slurm/slurm.log?

brentf · January 24, 2018, 3:39am

Hello,

Here is the result:

[2018-01-23T11:57:25.355] layouts: no layout to initialize
[2018-01-23T11:57:25.448] layouts: loading entities/relations information
[2018-01-23T11:57:25.449] error: Could not open node state file /var/spool/slurm/node_state: No such file or directory
[2018-01-23T11:57:25.449] error: NOTE: Trying backup state save file. Information may be lost!
[2018-01-23T11:57:25.449] No node state file (/var/spool/slurm/node_state.old) to recover
[2018-01-23T11:57:25.449] error: Incomplete node data checkpoint file
[2018-01-23T11:57:25.449] Recovered state of 0 nodes
[2018-01-23T11:57:25.449] error: Could not open job state file /var/spool/slurm/job_state: No such file or directory
[2018-01-23T11:57:25.449] error: NOTE: Trying backup state save file. Jobs may be lost!
[2018-01-23T11:57:25.449] No job state file (/var/spool/slurm/job_state.old) to recover
[2018-01-23T11:57:25.456] cons_res: select_p_node_init
[2018-01-23T11:57:25.456] cons_res: preparing for 2 partitions
[2018-01-23T11:57:25.456] error: Could not open reservation state file /var/spool/slurm/resv_state: No such file or directory
[2018-01-23T11:57:25.456] error: NOTE: Trying backup state save file. Reservations may be lost
[2018-01-23T11:57:25.456] No reservation state file (/var/spool/slurm/resv_state.old) to recover
[2018-01-23T11:57:25.456] Recovered state of 0 reservations
[2018-01-23T11:57:25.456] error: Could not open trigger state file /var/spool/slurm/trigger_state: No such file or directory
[2018-01-23T11:57:25.456] error: NOTE: Trying backup state save file. Triggers may be lost!
[2018-01-23T11:57:25.456] No trigger state file (/var/spool/slurm/trigger_state.old) to recover
[2018-01-23T11:57:25.456] error: Incomplete trigger data checkpoint file
[2018-01-23T11:57:25.458] read_slurm_conf: backup_controller not specified.
[2018-01-23T11:57:25.458] Reinitializing job accounting state
[2018-01-23T11:57:25.458] Ending any jobs in accounting that were running when controller went down on
[2018-01-23T11:57:25.458] cons_res: select_p_reconfigure
[2018-01-23T11:57:25.458] cons_res: select_p_node_init
[2018-01-23T11:57:25.458] cons_res: preparing for 2 partitions
[2018-01-23T11:57:25.458] Running as primary controller
[2018-01-23T11:57:25.458] Registering slurmctld at port 6817 with slurmdbd.
[2018-01-23T11:57:25.579] error: slurmdbd: Issue with call DBD_REGISTER_CTLD(1434): 4294967295(This cluster hasn't been added to accounting yet)
[2018-01-23T11:57:25.579] fatal: You need to add this cluster to accounting if you want to enforce associations, or no jobs will ever run.

Regards.

lnicola · January 24, 2018, 9:25am

You might be able to fix that by running sudo sacctmgr add cluster sen2agri, but again, the installer should have done so. It’s really strange.

brentf · January 25, 2018, 8:04am

Hello,

Actually, I was requested by my colleague to uninstall and reinstall sen2agri (for evaluation purposes) and it seemed that the slurm services are now running:

Screenshot at 2018-01-25 15-39-50

The same with mariadb:

Screenshot at 2018-01-25 15-54-47

But some error still occurs when I ran sudo cat /var/log/slurm/slurmdbd.log:

Screenshot at 2018-01-25 15-56-37

Regards.

lnicola · January 29, 2018, 3:15pm

Hi,

Maybe a restart will help, but – assuming that the configuration files are there and the install script was run – I don’t understand what could cause this error.

Innocturion · April 19, 2021, 6:01pm

Hi,

unfortunately I have to reopen this issue.

On sen2agri-2.0.3 this error occurs and we were not able to successfully run slurm from the install script.
The error message is always:
slurmdbd: Issue with call DBD_REGISTER_CTLD(1434): 4294967295(This cluster hasn’t been added to accounting yet)

On out test machine I manually reinitialized slurm services, cluster and qos registration and it somehow works now. I have no clue why. Any ideas?