How to enable slurm services

Yes, it’s required. We should try to get slurmdbd running, then slurmctld, then slurm. To see the reason a daemon failed to start you can use e.g.:

journalctl -u slurmdbd --since boot

Note that our installer deploys some configuration files for all these. In our configuration, slurmdbd requires a MariaDB database (which the installer normally sets up).

Assuming everything was installed properly, MariaDB doesn’t like system crashes and power failures, so if you had one you’ll need to type in a couple of commands to repair the database.

Hello,

This is the result:

-- Logs begin at Thu 2017-12-21 05:54:07 +08, end at Wed 2018-01-03 17:15:57 +08. --
Jan 02 10:19:06 ESSCGeo-Sen2Agri systemd[1]: Starting Slurm DBD accounting daemon...
Jan 02 10:20:36 ESSCGeo-Sen2Agri systemd[1]: slurmdbd.service start operation timed out. Terminating.
Jan 02 10:20:36 ESSCGeo-Sen2Agri systemd[1]: Failed to start Slurm DBD accounting daemon.
Jan 02 10:20:36 ESSCGeo-Sen2Agri systemd[1]: Unit slurmdbd.service entered failed state.
Jan 02 10:20:36 ESSCGeo-Sen2Agri systemd[1]: slurmdbd.service failed.

When I tried to start it:

`Job for slurmdbd.service failed because a timeout was exceeded. See "systemctl status slurmdbd.service" and "journalctl -xe" for details.`

Can I uninstall and reinstall it again?

That’s strange, maybe try restarting it again:

systemctl status mariadb
systemctl start slurmdbd
journalctl -u slurmdbd --since today

Can I uninstall and reinstall it again?

That usually doesn’t help.

Mariadb is active:

● mariadb.service - MariaDB database server
   Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
   Active: active (running) since Sat 2017-12-23 10:40:46 +08; 1 weeks 4 days ago

But slurmdbd is not:

`Job for slurmdbd.service failed because a timeout was exceeded. See "systemctl status slurmdbd.service" and "journalctl -xe" for details.`
-- Logs begin at Thu 2017-12-21 18:34:07 +08, end at Wed 2018-01-03 19:43:55 +08. --
Jan 03 17:18:48 ESSCGeo-Sen2Agri systemd[1]: Starting Slurm DBD accounting daemon...
Jan 03 17:20:18 ESSCGeo-Sen2Agri systemd[1]: slurmdbd.service start operation timed out. Terminating.
Jan 03 17:20:18 ESSCGeo-Sen2Agri systemd[1]: Failed to start Slurm DBD accounting daemon.
Jan 03 17:20:18 ESSCGeo-Sen2Agri systemd[1]: Unit slurmdbd.service entered failed state.
Jan 03 17:20:18 ESSCGeo-Sen2Agri systemd[1]: slurmdbd.service failed.
Jan 03 17:22:35 ESSCGeo-Sen2Agri systemd[1]: Starting Slurm DBD accounting daemon...
Jan 03 17:24:05 ESSCGeo-Sen2Agri systemd[1]: slurmdbd.service start operation timed out. Terminating.
Jan 03 17:24:05 ESSCGeo-Sen2Agri systemd[1]: Failed to start Slurm DBD accounting daemon.
Jan 03 17:24:05 ESSCGeo-Sen2Agri systemd[1]: Unit slurmdbd.service entered failed state.
Jan 03 17:24:05 ESSCGeo-Sen2Agri systemd[1]: slurmdbd.service failed.
Jan 03 19:38:25 ESSCGeo-Sen2Agri systemd[1]: Starting Slurm DBD accounting daemon...
Jan 03 19:39:55 ESSCGeo-Sen2Agri systemd[1]: slurmdbd.service start operation timed out. Terminating.
Jan 03 19:39:55 ESSCGeo-Sen2Agri systemd[1]: Failed to start Slurm DBD accounting daemon.
Jan 03 19:39:55 ESSCGeo-Sen2Agri systemd[1]: Unit slurmdbd.service entered failed state.
Jan 03 19:39:55 ESSCGeo-Sen2Agri systemd[1]: slurmdbd.service failed.

What about sudo cat /var/log/slurm/slurmdbd.log?

[2018-01-03T17:19:58.779] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:19:58.779] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
[2018-01-03T17:20:03.780] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:20:03.780] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
[2018-01-03T17:20:08.780] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:20:08.780] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
[2018-01-03T17:20:13.781] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:20:13.781] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
[2018-01-03T17:22:35.924] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
[2018-01-03T17:22:35.924] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
[2018-01-03T17:22:40.925] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)

This is only a snippet of the error, but the errors are all the same.

That means that the SLURM database is not set up. You previously mentioned doing a manual installation, that could be the cause.

The install script runs the following (as root):

yum -y install mariadb-server mariadb
systemctl start mariadb
systemctl enable mariadb
mysql_secure_installation # it answers with 'y' on everything, but doesn't change the root password
mysql -u root -p -e "create database slurm_acct_db;create user slurm@localhost;set password for slurm@localhost = password('sen2agri');grant usage on *.* to slurm;grant all privileges on slurm_acct_db.* to slurm;flush privileges;" # this will ask for the root password, which is empty

But again, there are a lot of other steps involved in the SLURM (and system) setup, which is why I strongly discourage doing them manually.

Hello,

This is the latest error message after entering the last line of your code.

ERROR 1396 (HY000) at line 1: Operation CREATE USER failed for 'slurm'@'localhost'

Anyway, I also drop the slurm_acct_db database, since it said that it already existed.

Thank you!

Is SLURM working now?

Hello,

It’s not working.

12 AM

Regards.

Does slurmdbd give the same access denied errors as before when you try to restart it?

sudo cat /var/log/slurm/slurmdbd.log?

Hi,
It seems that I had the same output for the log.

    [2018-01-03T19:39:30.601] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
    [2018-01-03T19:39:30.601] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
    [2018-01-03T19:39:35.602] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)
    [2018-01-03T19:39:35.602] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
    [2018-01-03T19:39:40.602] error: mysql_real_connect failed: 1045 Access denied for user 'slurm'@'localhost' (using password: YES)

I already reset the root password of mysql to a new one. Does this have something to do with the error?

Is the information in /etc/slurm/slurmdbd.conf correct? It should contain something like

StorageType=accounting_storage/mysql
#StorageHost=localhost
#StoragePort=1234
StorageUser=slurm
StoragePass=sen2agri
#StorageLoc=slurm_acct_db

I guess I have the same config as you.

29 PM

Regards,

Then maybe try this again:

mysql -u root -p

create database slurm_acct_db;
create user slurm@localhost;
set password for slurm@localhost = password('sen2agri');
grant usage on *.* to slurm;
grant all privileges on slurm_acct_db.* to slurm;
flush privileges;
\q

If you get an error saying that the database or user already exist, you can skip to the next line.

Hello,

I encountered this problem on the second line (create user slurm@localhost;):

ERROR 1396 (HY000): Operation CREATE USER failed for ‘slurm’@‘localhost’

Regards,

Hi again,

Actually I tried to uninstall and install the whole Sen2Agri system and I encounter this slurm problem at least two times already.

ERROR 1396 (HY000): Operation CREATE USER failed for ‘slurm’@‘localhost’

It soulds like the user already exists. Try skipping that command and running the next ones.

Hello,

Here are the results of the commands:

Screenshot at 2018-01-23 12-05-04

Regards.

So slurmdbd is now running, but slurmctld is not. Can you check /var/log/slurm/slurm.log?