[2018-01-03T17:19:58.779] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:19:58.779] error: The database must be up when starting the MYSQL plugin. Trying again in 5 seconds.
[2018-01-03T17:20:03.780] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:20:03.780] error: The database must be up when starting the MYSQL plugin. Trying again in 5 seconds.
[2018-01-03T17:20:08.780] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:20:08.780] error: The database must be up when starting the MYSQL plugin. Trying again in 5 seconds.
[2018-01-03T17:20:13.781] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:20:13.781] error: The database must be up when starting the MYSQL plugin. Trying again in 5 seconds.
[2018-01-03T17:22:35.924] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:22:35.924] error: The database must be up when starting the MYSQL plugin. Trying again in 5 seconds.
[2018-01-03T17:22:40.925] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
This is only a snippet of the error, but the errors are all the same.
That means that the SLURM database is not set up. You previously mentioned doing a manual installation, that could be the cause.
The install script runs the following (as root):
yum -y install mariadb-server mariadb
systemctl start mariadb
systemctl enable mariadb
mysql_secure_installation # it answers with 'y' on everything, but doesn't change the root password
mysql -u root -p -e "create database slurm_acct_db;create user slurm@localhost;set password for slurm@localhost = password('sen2agri');grant usage on *.* to slurm;grant all privileges on slurm_acct_db.* to slurm;flush privileges;" # this will ask for the root password, which is empty
But again, there are a lot of other steps involved in the SLURM (and system) setup, which is why I strongly discourage doing them manually.
Hi,
It seems that I had the same output for the log.
[2018-01-03T19:39:30.601] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T19:39:30.601] error: The database must be up when starting the MYSQL plugin. Trying again in 5 seconds.
[2018-01-03T19:39:35.602] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T19:39:35.602] error: The database must be up when starting the MYSQL plugin. Trying again in 5 seconds.
[2018-01-03T19:39:40.602] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
I already reset the root password of mysql to a new one. Does this have something to do with the error?
mysql -u root -p
create database slurm_acct_db;
create user slurm@localhost;
set password for slurm@localhost = password('sen2agri');
grant usage on *.* to slurm;
grant all privileges on slurm_acct_db.* to slurm;
flush privileges;
\q
If you get an error saying that the database or user already exist, you can skip to the next line.
[2018-01-23T11:57:25.355] layouts: no layout to initialize
[2018-01-23T11:57:25.448] layouts: loading entities/relations information
[2018-01-23T11:57:25.449] error: Could not open node state file /var/spool/slurm/node_state: No such file or directory
[2018-01-23T11:57:25.449] error: NOTE: Trying backup state save file. Information may be lost!
[2018-01-23T11:57:25.449] No node state file (/var/spool/slurm/node_state.old) to recover
[2018-01-23T11:57:25.449] error: Incomplete node data checkpoint file
[2018-01-23T11:57:25.449] Recovered state of 0 nodes
[2018-01-23T11:57:25.449] error: Could not open job state file /var/spool/slurm/job_state: No such file or directory
[2018-01-23T11:57:25.449] error: NOTE: Trying backup state save file. Jobs may be lost!
[2018-01-23T11:57:25.449] No job state file (/var/spool/slurm/job_state.old) to recover
[2018-01-23T11:57:25.456] cons_res: select_p_node_init
[2018-01-23T11:57:25.456] cons_res: preparing for 2 partitions
[2018-01-23T11:57:25.456] error: Could not open reservation state file /var/spool/slurm/resv_state: No such file or directory
[2018-01-23T11:57:25.456] error: NOTE: Trying backup state save file. Reservations may be lost
[2018-01-23T11:57:25.456] No reservation state file (/var/spool/slurm/resv_state.old) to recover
[2018-01-23T11:57:25.456] Recovered state of 0 reservations
[2018-01-23T11:57:25.456] error: Could not open trigger state file /var/spool/slurm/trigger_state: No such file or directory
[2018-01-23T11:57:25.456] error: NOTE: Trying backup state save file. Triggers may be lost!
[2018-01-23T11:57:25.456] No trigger state file (/var/spool/slurm/trigger_state.old) to recover
[2018-01-23T11:57:25.456] error: Incomplete trigger data checkpoint file
[2018-01-23T11:57:25.458] read_slurm_conf: backup_controller not specified.
[2018-01-23T11:57:25.458] Reinitializing job accounting state
[2018-01-23T11:57:25.458] Ending any jobs in accounting that were running when controller went down on
[2018-01-23T11:57:25.458] cons_res: select_p_reconfigure
[2018-01-23T11:57:25.458] cons_res: select_p_node_init
[2018-01-23T11:57:25.458] cons_res: preparing for 2 partitions
[2018-01-23T11:57:25.458] Running as primary controller
[2018-01-23T11:57:25.458] Registering slurmctld at port 6817 with slurmdbd.
[2018-01-23T11:57:25.579] error: slurmdbd: Issue with call DBD_REGISTER_CTLD(1434): 4294967295(This cluster hasn't been added to accounting yet)
[2018-01-23T11:57:25.579] fatal: You need to add this cluster to accounting if you want to enforce associations, or no jobs will ever run.
Actually, I was requested by my colleague to uninstall and reinstall sen2agri (for evaluation purposes) and it seemed that the slurm services are now running:
The same with mariadb:
But some error still occurs when I ran sudo cat /var/log/slurm/slurmdbd.log:
Maybe a restart will help, but – assuming that the configuration files are there and the install script was run – I don’t understand what could cause this error.
On sen2agri-2.0.3 this error occurs and we were not able to successfully run slurm from the install script.
The error message is always:
slurmdbd: Issue with call DBD_REGISTER_CTLD(1434): 4294967295(This cluster hasn’t been added to accounting yet)
On out test machine I manually reinitialized slurm services, cluster and qos registration and it somehow works now. I have no clue why. Any ideas?