Configuring the Software

Documentation for the software we're configuring can usually be found somewhere in the top level directory of the source code. Sometimes the documentation will be in a file called INSTALL or README or some other file, usually in capital letters, but not necessarily. You'll need to poke around a little. Do read through the documentation, though. I'll be configuring the software quickly and a bit haphazardly just to get things up and running. Once I get things running, I can do some testing and go back to optimize things.

Modules

To start, we want to configure the Modules package so we can add and remove software packages for our environment. The Modules package is fairly simple to work with, but provides some very powerful features. We're going to start simple and then expand the functionality as we need it. At the end of the make install step, there's a notice printed that says:

NOTICE: Modules installation is complete. Please read the 'Configuration' section in INSTALL.txt to learn how to adapt your installation and make it fit your needs.

Looking through the documentation, there's a quick test to see if our installation is at least mostly correct. Running "modulecmd sh" should print out the usage instructions. A quick test shows this works correctly, so we're on the right path. Next, we can run make test, but this requires dejagnu which we haven't installed. You can install it and run the test suite if you like. I'm going to hope for the best and move on. Unfortunately, we don't have any other software packages that we can add to Modules yet, but we can at least get it set up in the system.

For the Modules package to function correctly, a number of environment variables and functions or aliases need to be set up. These are usually set up at the system level through startup files (run commands) for the user's shell. There are examples in the directory where we installed modules under the init/ directory. Unfortunately, the examples are for a slightly older shell startup system, but we can easily make them work with the new system. Run the following commands, and everything should be set up correctly.

cd /apps/modules-4.2.0/init
cp profile.csh /etc/profile.d/modules.csh
cp profile.sh /etc/profile.d/modules.sh

Now log out, log back in as user admin, and run module avail to see if things work correctly. You should get output similar to the following.

admin@baker:~> module avail
--------------------------- /apps/modules-4.2.0/modulefiles ----------------------------
dot  module-git  module-info  modules  null  use.own  
admin@baker:~>

This shows that we have access to all the default modules, so things look good. There are some default modules already defined. I'm not sure if we need these or not, so let's set up our own directory to store our modulefiles. That will keep things separated from the default files in the distribution, and we can back up just our own directory when we upgrade.

We could create a new subdirectory under /apps/modulles-4.2.0/modulefiles, but I'm still a little wary that an upgrade may do something to that directory, so let's make the directory one level lower and call it /apps/modules-4.2.0/modulefiles/picluster. We don't have anything to put in it yet, so let's move on and get munge set up.

munge

Although munge is a fairly simple package in terms of functionality, there's a lot going on behind the scenes. You should read through the documentation for how to configure this package. First, we need to set up a secret key for security, then we can start the munged daemon and test it. I'm going to generate the secret key using the (not so) random number generator, /dev/urandom, but you can use anything that generates 1024 bytes, even just a random 1k file. Here's what I did:

admin@baker:/apps/munge-0.5.13> dd if=/dev/urandom of=/apps/munge-0.5.13/etc/munge/munge.key bs=1 count=1024
1024+0 records in
1024+0 records out
1024 bytes (1.0 kB, 1.0 KiB) copied, 0.0157578 s, 65.0 kB/s
admin@baker:/apps/munge-0.5.13> chmod go-rwx /apps/munge-0.5.13/etc/munge/munge.key
admin@baker:/apps/munge-0.5.13> sbin/munged
admin@baker:/apps/munge-0.5.13> bin/remunge
2018-10-30 04:32:06 Spawning 1 thread for encoding
2018-10-30 04:32:06 Processing credentials for 1 second
2018-10-30 04:32:07 Processed 2218 credentials in 1.010s (2196 creds/sec)
admin@baker:/apps/munge-0.5.13>

It looks like things are working correctly, at least so far. We'll find out if we missed anything when we start plugging all the pieces together. Make a note of the final line that shows how many credentials we processed in one second. We'll revisit this when we start optimizing things.

We're running this as the admin user, but munge should run under its own, non-privileged account and group. When we originally installed munge we create some directories and changed the owner to admin. Now we have to pay for not reading the documentation thoroughly, and change the ownership of all those directories to the munge user.

We also want munge to start automatically when the system is rebooted, so we need to copy the munge.service file from /apps/source/munge-0.5.13/src/etc to /usr/lib/systemd/system.

Let's create the munge user and group, and get the systemd startup files in place.

baker:~ # useradd -U munge
baker:~ # id munge
uid=1002(munge) gid=1002(munge) groups=1002(munge)
baker:~ # cp /apps/source/munge-0.5.13/src/etc/munge.service /usr/lib/systemd/system/

Also, munge expects to store its process identifier (PID) under /var/run/munge/munge.pid, so the directory /var/run/munge needs to be owned by munge. The systemd facility has a method to create that directory and change the ownership to munge so we need modify the systemd startup file. Edit /usr/lib/systemd/system/munge.service and modify the [Service] section so the file looks like this:

[Unit]
Description=MUNGE authentication service
Documentation=man:munged(8)
After=network.target
After=time-sync.target

[Service]
Type=forking
ExecStart=/apps/munge-0.5.13/sbin/munged
PIDFile=/var/run/munge/munged.pid
RuntimeDirectory=munge
User=munge
Group=munge
Restart=on-abort

[Install]
WantedBy=multi-user.target

Do normal users need access to munge? I don't think so, but I'm not sure. Let's set up a modulefile anyway, and at least we can test our Modules installation.

There are a lot of things that you can put into a modulefile, but there are a few things that are fairly standard. Modules sets up the user's environment to use a particular software package. To do that, the user's environment has to be augmented to include the binaries directory, the libraries directory, and the manuals (man) directory. It's always good practice to add a help message as well. There's a lot of functionality available, but I'm going to focus on just the basics for now. When you need to read the documentation for the gritty details, you'll know.

I basic modulefile for munge looks like this. We'll use this as a basic template for all our other modulefiles. If you want a more in-depth tutorial on modules see Managing User Environments under the HPC For Science section of this site.

#%Module1.0
## munge modulefile
##
proc ModulesHelp { } {
    puts stderr "This module sets up the user environment for munge"
}

module-whatis "Configures a user environment to use munge"
set curMod [module-info name]
prepend-path PATH /apps/munge-0.5.13/bin
prepend-path LIBPATH /apps/munge-0.5.13/lib64
prepend-path MANPATH /apps/munge-0.5.13/share/man

You can save the module as /apps/modules-4.2.0/picluster/munge and it will show up in the list of available modules, but we may implement a newer version at some point, so it's always helpful to have the version number in the name of the module. Modules will also search subdirectories for module files, so that gives us an easy way to manage module files without cluttering up a single directory with a bunch of different module files for different software packages. Furthermore, Modules will let you ask for a module without specifying its version number, and that's pretty handy. You can also specify a default module if you have multiple modules, so you can implement a new version but still keep your users on the old version by default until you've finished testing and are ready to move it into production. It helps to see this in action.

Create the directory /apps/modules-4.2.0/picluster/munge and save the module file above as /apps/modules-4.2.0/picluster/munge/munge-0.5.13. Now if we list available modules, we can see it listed as munge/munge-0.5.13, but we can load the specific file just by typing module load munge. Here's what it looks like:

admin@baker:~> module avail
--------------------------- /apps/modules-4.2.0/modulefiles ----------------------------
dot  module-git  module-info  modules  null  use.own  

---------------------------- /apps/modules-4.2.0/picluster -----------------------------
munge/munge-0.5.13
admin@baker:~> which remunge
which: no remunge in (/apps/modules-4.2.0/bin:/home/admin/bin:/usr/local/bin:/usr/bin:/bin)
admin@baker:~> module add munge
admin@baker:~> which remunge
/apps/munge-0.5.13/bin/remunge
admin@baker:~> remunge
2018-10-31 05:12:18 Spawning 1 thread for encoding
2018-10-31 05:12:18 Processing credentials for 1 second
2018-10-31 05:12:19 Processed 2289 credentials in 1.001s (2286 creds/sec)
admin@baker:~>
admin@baker:~> module list
Currently Loaded Modulefiles:
 1) munge/munge-0.5.13
admin@baker:~>

From this output, we can see that the module file actually exists after we create it (module avail shows it now). Before we add the module, remunge is not in our path, but after we add the modulefile, we can run remunge without a full path. Now let's add a .version file to the same directory to make sure that version 0.5.13 gets loaded by default. We don't have any other versions available, but it's good practice to set the default anyway. The .version file looks like this:

#%Module1.0
##
set ModulesVersion "munge-0.5.13"

Save that file as /apps/modules-4.2.0/picluster/munge/.version. Now let's unload and load the munge module again to see how things look.

admin@baker:~> module unload munge
admin@baker:~> module list
No Modulefiles Currently Loaded.
admin@baker:~> module load munge
admin@baker:~> module list
Currently Loaded Modulefiles:
 1) munge/munge-0.5.13(default)  
admin@baker:~>

The add and del commands to module are synonymous with load and unload. Use whichever you're most comfortable with. I tend to go back and forth between them.

From this you can see that we can load the module with just the short name, and we get the long name of the module loaded. It also tells us that this is the default module for munge. Things look good! Let's move on to SLURM.

SLURM

Now we're going to set up our job scheduler. I've selected SLURM as the scheduler because it's still free, but you can buy commercial support if you're running in a production environment. It's also a fully featured job scheduler with a modular design that makes it easy to add and remove features. SLURM is really a great scheduler. It's still fairly young, in terms of job schedulers, but it has advanced rapidly, and offers all the features of more mature schedulers, with the advantage that it's much easier to configure, and it's fast. As you might expect, the job scheduler is a little more complicated than any of the software we've deployed so far. Fortunately, the package comes with examples and a configuration generator that will help us get up and running quickly. Let's get started.

There are three primary components of the job manager:

  • slurmd - the process that runs on compute nodes and runs jobs
  • slurmdbd - the database process that keeps track of jobs
  • slurmctld - the control process that manages scheduling jobs on compute nodes

Usually, the head node (management node) runs slurmctld and slurmdbd and the compute nodes only run slurmd. So far, we don't have any compute nodes, so for testing purposes, let's set up the head node as a compute node as well.. This won't provide a perfect test environment, since it won't test trusted communication between the compute nodes and the head node, or between the compute nodes, but it will at least give us an idea if we're on the right track.

To start, lets get the startup files in place so the job manager starts whenever we reboot. Our system is running systemd (sadly) to manage processes that start when our machine boots. If you look in /apps/source/slurm-19.05/etc, you'll find systemd configuration files for all three components of SLURM. Copy all of these (as root) to /usr/lib/systemd/system like this:

baker:~ # cd /apps/source/slurm-19.05/etc
baker:/apps/source/slurm-19.05/etc # cp *.service /usr/lib/systemd/system/
baker:/apps/source/slurm-19.05/etc #

Now we're almost ready to start up the job manager and see if everything works. If we keep going, we will find that we've forgotten something. The slurmdbd process expects to store its data in a MySQL database, whch we haven't installed yet. We also haven't generated the initial job manager configuration file. Let's start with the database, since we'll need to provide a database username and password in the job manager configuration.

f we go back to our source code directory for SLURM where we configured and built the code, we can check the config.log file, we find this warning message:

configure: WARNING: *** mysql_config not found. Evidently no MySQL development libs installed on system.

We missed this the first time around, and slurmdbd won't work without it. This is a good lesson for us. We should be more careful about checking the output from our configure and build steps. It's also a good lesson on how to fix things, though. Even though slurmdbd isn't strictly required, we'll build and configure it as part of our deployment. The historical job data it holds is very useful in metrics reports to your stakeholders. These types of problems will crop up as we build more software packages, especially scientific software that we're not familiar with.

To fix this, let's install the MariaDB server and the development libraries, and run the configuration step again to see if we can fix this. Here's what I tried:

baker:~ # zypper install mariadb libmariadb-devel
Loading repository data...
Reading installed packages...
Resolving package dependencies...

The following 10 NEW packages are going to be installed:
  libJudy1 libltdl7 libmariadb3 libmariadb-devel libpq5 mariadb mariadb-client
  mariadb-errormessages psqlODBC unixODBC

The following recommended package was automatically selected:
  psqlODBC

10 new packages to install.
Overall download size: 17.6 MiB. Already cached: 0 B. After the operation, additional
149.4 MiB will be used.
Continue? [y/n/...? shows all options] (y): y
Retrieving package libpq5-10.5-lp150.3.3.1.aarch64
                                                  (1/10), 172.0 KiB (699.0 KiB unpacked)
Retrieving: libpq5-10.5-lp150.3.3.1.aarch64.rpm .....................[done (71.8 KiB/s)]
Retrieving package mariadb-errormessages-10.2.15-lp150.2.6.1.noarch
                                                  (2/10), 224.8 KiB (  2.2 MiB unpacked)
Retrieving: mariadb-errormessages-10.2.15-lp150.2.6.1.noarch.rpm .................[done]
Retrieving package libJudy1-1.0.5-lp150.1.2.aarch64
                                                  (3/10), 102.5 KiB (323.0 KiB unpacked)
Retrieving: libJudy1-1.0.5-lp150.1.2.aarch64.rpm .................................[done]
Retrieving package libltdl7-2.4.6-lp150.1.3.aarch64
                                                  (4/10),  35.4 KiB ( 66.6 KiB unpacked)
Retrieving: libltdl7-2.4.6-lp150.1.3.aarch64.rpm .................................[done]
Retrieving package libmariadb3-3.0.3-lp150.1.3.aarch64
                                                  (5/10), 113.7 KiB (354.4 KiB unpacked)
Retrieving: libmariadb3-3.0.3-lp150.1.3.aarch64.rpm ..................[done (1.0 KiB/s)]
Retrieving package unixODBC-2.3.6-lp150.1.2.aarch64
                                                  (6/10), 292.7 KiB (  2.3 MiB unpacked)
Retrieving: unixODBC-2.3.6-lp150.1.2.aarch64.rpm ....................[done (12.5 KiB/s)]
Retrieving package libmariadb-devel-3.0.3-lp150.1.3.aarch64
                                                  (7/10),  50.9 KiB (224.0 KiB unpacked)
Retrieving: libmariadb-devel-3.0.3-lp150.1.3.aarch64.rpm .........................[done]
Retrieving package psqlODBC-10.01.0000-lp150.1.2.aarch64
                                                  (8/10), 362.9 KiB (  1.2 MiB unpacked)
Retrieving: psqlODBC-10.01.0000-lp150.1.2.aarch64.rpm ............................[done]
Retrieving package mariadb-client-10.2.15-lp150.2.6.1.aarch64
                                                  (9/10),   1.0 MiB ( 25.7 MiB unpacked)
Retrieving: mariadb-client-10.2.15-lp150.2.6.1.aarch64.rpm .........[done (194.6 KiB/s)]
Retrieving package mariadb-10.2.15-lp150.2.6.1.aarch64
                                                 (10/10),  15.2 MiB (116.3 MiB unpacked)
Retrieving: mariadb-10.2.15-lp150.2.6.1.aarch64.rpm ..................[done (2.5 MiB/s)]
Checking for file conflicts: .....................................................[done]
( 1/10) Installing: libpq5-10.5-lp150.3.3.1.aarch64 ..............................[done]
( 2/10) Installing: mariadb-errormessages-10.2.15-lp150.2.6.1.noarch .............[done]
( 3/10) Installing: libJudy1-1.0.5-lp150.1.2.aarch64 .............................[done]
( 4/10) Installing: libltdl7-2.4.6-lp150.1.3.aarch64 .............................[done]
( 5/10) Installing: libmariadb3-3.0.3-lp150.1.3.aarch64 ..........................[done]
( 6/10) Installing: unixODBC-2.3.6-lp150.1.2.aarch64 .............................[done]
( 7/10) Installing: libmariadb-devel-3.0.3-lp150.1.3.aarch64 .....................[done]
( 8/10) Installing: psqlODBC-10.01.0000-lp150.1.2.aarch64 ........................[done]
Additional rpm output:
odbcinst: Driver installed. Usage count increased to 1. 
    Target directory is /etc/unixODBC

( 9/10) Installing: mariadb-client-10.2.15-lp150.2.6.1.aarch64 ...................[done]
Additional rpm output:
usermod: no changes

(10/10) Installing: mariadb-10.2.15-lp150.2.6.1.aarch64 ..........................[done]
Additional rpm output:
usermod: no changes

Update notifications were received from the following packages:
mariadb-10.2.15-lp150.2.6.1.aarch64 (/var/adm/update-messages/mariadb-10.2.15-lp150.2.6.1-something)
View the notifications now? [y/n] (n): y

(Use the Enter or Space key to scroll the text by lines or pages.)

Message from package mariadb:

You just installed MySQL server for the first time.

You can start it using:
 rcmysql start

During first start empty database will be created for your automatically.

PLEASE REMEMBER TO SET A PASSWORD FOR THE MariaDB root USER !
To do so, start the server, then issue the following commands:

'/usr/bin/mysqladmin' -u root password 'new-password'
'/usr/bin/mysqladmin' -u root -h <hostname> password 'new-password'

Alternatively you can run:
'/usr/bin/mysql_secure_installation'

which will also give you the option of removing the test
databases and anonymous user created by default. This is
strongly recommended for production servers.

-----------------------------------------------------------------------------

(Press 'q' to exit the pager.)
baker:~ # cd /apps/source/slurm-19.05
baker:/apps/source/slurm-19.05 # su admin
admin@baker:/apps/source/slurm-19.05> ./configure --prefix=/apps/slurm-19.05 --with-munge=/apps/munge-0.5.13 |& tee config.log2

A quick check of config.log2 shows that the configuration step found what it needed for MySQL support. Make a note of the messages from the MySWL install. We have a little configuration to do before we start slurmdbd, and we'll get to that shortly.

admin@baker:/apps/source/slurm-19.05> grep mysql config.log2
checking for mysql_config... /usr/bin/mysql_config
config.status: creating src/plugins/accounting_storage/mysql/Makefile
config.status: creating src/plugins/jobcomp/mysql/Makefile
admin@baker:/apps/source/slurm-19.05>

Now we can rebuild the software with MySQL support. SLURM is a fairly large software package, and it takes a while to build it on the Pi. Let's see how much time we can save by running the build in parallel. I'll remove all the output from the build, and show just the commands I ran and the timing:

Serial build:

make clean; make |& tee make.log
2282.90user 229.23system 42:28.03elapsed 98%CPU (0avgtext+0avgdata 131288maxresident)k
126128inputs+3158192outputs (106major+12449341minor)pagefaults 0swaps

Parallel build:

make clean; make |& tee make-parallel.log
2603.61user 266.29system 21:22.78elapsed 223%CPU (0avgtext+0avgdata 130800maxresident)k
4632inputs+3151576outputs (8major+12414971minor)pagefaults 0swaps

We didn't really optimize our system for timing software builds (by turning off unneeded services, for example), but this still gives us a reasonable idea of what kind of speed increase we can get by using all our available processor cores instead of a single core. In the case, the elapsed build time decreased from 42 minutes, 28.03 seconds to 21 minutes, 22.78 seconds, so roughly twice as fast. Processor utilization went from 98% to 223%, so again roughly twice as fast.

Now that we have SLURM built with MySQL support, we need to run the install step again to put the corrected files into place. We also need to create a modulefile to we can reference the software from our environment. For now, let's just use the template we used for munge. Copy this into /apps/modules-4.2.0/picluster/slurm/slurm-19.05 and create a matching /apps/modules-4.2.0/picluster/slurm/.version file:

#%Module1.0
## slurm modulefile
##
proc ModulesHelp { } {
    puts stderr "This module sets up the user environment for slurm"
}

module-whatis "Configures a user environment to use slurm"
set curMod [module-info name]
prepend-path PATH /apps/slurm-19.05/bin
prepend-path LIBPATH /apps/slurm-19.05/lib64
prepend-path MANPATH /apps/slurm-19.05/share/man
#%Module1.0
##
set ModulesVersion "slurm-19.05"

Now that we have the module files set up, let's get SLURM configured and make sure it works for us. We'll start getting MySQL set up for the job database. First, look back at the output from the MySQL install. It gives some basic commands for starting the database and securing it with a password. Let's start the database and secure it with /usr/bin/mysql_secure_installation. Since MySQL is a system service, I'll do the initial configuration as root.

baker:~ # rcmysql start
baker:~ # /usr/bin/mysql_secure_installation 

NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MariaDB
      SERVERS IN PRODUCTION USE!  PLEASE READ EACH STEP CAREFULLY!

In order to log into MariaDB to secure it, we'll need the current
password for the root user.  If you've just installed MariaDB, and
you haven't set the root password yet, the password will be blank,
so you should just press enter here.

Enter current password for root (enter for none): 
OK, successfully used password, moving on...

Setting the root password ensures that nobody can log into the MariaDB
root user without the proper authorisation.

Set root password? [Y/n] y
New password: 
Re-enter new password: 
Password updated successfully!
Reloading privilege tables..
 ... Success!

By default, a MariaDB installation has an anonymous user, allowing anyone
to log into MariaDB without having to have a user account created for
them.  This is intended only for testing, and to make the installation
go a bit smoother.  You should remove them before moving into a
production environment.

Remove anonymous users? [Y/n] y
 ... Success!

Normally, root should only be allowed to connect from 'localhost'.  This
ensures that someone cannot guess at the root password from the network.

Disallow root login remotely? [Y/n] y
 ... Success!

By default, MariaDB comes with a database named 'test' that anyone can
access.  This is also intended only for testing, and should be removed
before moving into a production environment.

Remove test database and access to it? [Y/n] y
 - Dropping test database...
 ... Success!
 - Removing privileges on test database...
 ... Success!

Reloading the privilege tables will ensure that all changes made so far
will take effect immediately.

Reload privilege tables now? [Y/n] y
 ... Success!

Cleaning up...

All done!  If you've completed all of the above steps, your MariaDB
installation should now be secure.

Thanks for using MariaDB!
baker:~ #

You'll notice as we work through the database installation and configuration that it's sometimes called MySQL and sometimes called MariaDB. This is because of a fork in the code. You can search the Internet and read all the details, but essentially Sun Microsystems bought the original MySQL, and then Oracle bought Sun. Some of the original developers were concerned with a free database being acquired by a commercial database company, so they created a fork of the code that was developed by the community as a permamnently free database.

No let's get the database set up so slurmdbd can talk to it. We need to create a user for slurmdbd to use for connections and give it permissions for the database. Here's what I did:

baker:~ # mysql -p
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 21
Server version: 10.2.15-MariaDB openSUSE package

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> create user 'slurmuser'@'localhost' identified by 'slurmpw72';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> grant all on slurm_acct_db.* to 'slurmuser'@'localhost';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> grant all on slurm_acct_db.* to 'slurmuser'@'baker';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> create database slurm_acct_db;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]>

slurmdbd needs a configuration file to tell it how to connect to the database. You can type man slurmdbd.conf to see the options available. If you do, you'll see we don't have the man pages installed, another oversight on building our image. zypper install man will fix that. Now, you can run module add slurm so the system can find the manual pages and executables for SLURM, and run man slurmdbd.conf again to see the man page. The following file is the configuration file I used for slurmdbd.conf. It should be saved as /apps/slurm-19.05/etc/slurmdbd.conf**. Also, set the permissions on the file so nobody else can read it, since it has the database password in it.

# SLURM database daemon configuration
#      
AuthInfo=/var/run/munge/munge.socket.2
AuthType=auth/munge
DbdHost=localhost
DebugLevel=4
PurgeEventAfter=1month
PurgeJobAfter=12month
PurgeResvAfter=1month
PurgeStepAfter=1month
PurgeSuspendAfter=1month
PurgeTXNAfter=12month
PurgeUsageAfter=24month
LogFile=/var/log/slurmdbd.log
PidFile=/var/tmp/slurmdbd.pid
SlurmUser=admin
StoragePass=SecretPassord
StorageType=accounting_storage/mysql
StorageUser=slurmuser

There's one more item that we need to fix. In the systemd service definition file for slurmdbd (/usr/lib/systemd/system/slurmdbd.service), the PID file is defined as /var/run/slurmdbd.pid. This needs to be changed to match our slurmdbd.conf file, which has the PID file listed as /var/tmp/slurmdbd.pid.

Now, try starting slurmdbd with systemctl start slurmdbd.service and check to make sure it's running:

baker:~ # systemctl start slurmdbd.service
baker:~ # ps -ef | grep slurm | grep -v grep
admin    29119     1  0 19:23 ?        00:00:00 /apps/slurm-19.05/sbin/slurmdbd
baker:~ #

From this we can see that our database service is up and running, and it's running as a non-priveleged user (admin). The first time it runs, it should create the database structure it needs to store accounting records. You can look in the MySQL database to make sure the structure got created if you want to verify that everything is working correctly.

Now let's set up our scheduler. SLURM comes with an HTML file that takes care of most of the configuration for you. It's located in the source directory, under doc/html/configurator.html (/apps/source/slurm-19.05/doc/html/configurator.html). This is a stand-alone web page, so you can copy it to your local system and open it in a web browser. Don't worry a lot about the values, as we will just be using this as a base to get started, and we'll adjust things as we go forward.

There are a few items to make a note of, though. Be sure to enter the host name of your head node into the field that says SlurmctldHost. In the next section (Compute Machines), change the NodeName field to read compute[01-04]. This will give us four compute nodes to start. I've also changed my Partition Name field to prod because my users keep asking where the real cluster is when they see debug in that field. In the node configuration section directly below, erase the field that says CPUs and enter the following information for the fields under it:

Sockets: 1

CoresPerSocket: 4

ThreadsPerCore: 1

You should also change the StateSaveLocation from /var/spool to /var/spool/slurm to work around some permissions issues. (We'll need to create this directory and give the slurm user permissions to rite here.) Change the ProcessTracking selection to Pgid since we don't have cgroup support yet. Under Job Completion Logging, select FileTxt and for JobCompLoc enter /var/log/slurm/slurm.job.log (the SlurmDBD plugin still doesn't work yet). Under Job Accounting Gather select Linux. This will let us track some job metrics. Under Job Accounting Storage, again, select SlurmDBD. Under Process ID Logging, change both path names to /var/run/slurm/ instead of just /var/run/ for the PID files. This will keep our PID files together and, again, avoid permissions problems. Press the Submit** button at the bottom, and your configuration will be displayed. You should end up with a file similar to this:

# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
SlurmctldHost=baker
#SlurmctldHost=
# 
#DisableRootJobs=NO 
#EnforcePartLimits=NO 
#Epilog=
#EpilogSlurmctld= 
#FirstJobId=1 
#MaxJobId=999999 
#GresTypes= 
#GroupUpdateForce=0 
#GroupUpdateTime=600 
#JobFileAppend=0 
#JobRequeue=1 
#JobSubmitPlugins=1 
#KillOnBadExit=0 
#LaunchType=launch/slurm 
#Licenses=foo*4,bar 
#MailProg=/bin/mail 
#MaxJobCount=5000 
#MaxStepCount=40000 
#MaxTasksPerNode=128 
MpiDefault=none
#MpiParams=ports=#-# 
#PluginDir= 
#PlugStackConfig= 
#PrivateData=jobs 
ProctrackType=proctrack/linuxproc
Prolog=/apps/slurm
#PrologFlags= 
#PrologSlurmctld= 
#PropagatePrioProcess=0 
#PropagateResourceLimits= 
#PropagateResourceLimitsExcept= 
#RebootProgram= 
ReturnToService=2
#SallocDefaultCommand= 
SlurmctldPidFile=/var/tmp/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root 
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/var/spool/slurm
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/affinity
TaskPluginParam=Sched
#TaskProlog=
#TopologyPlugin=topology/tree 
#TmpFS=/tmp 
#TrackWCKey=no 
#TreeWidth= 
#UnkillableStepProgram= 
#UsePAM=0 
# 
# 
# TIMERS 
#BatchStartTimeout=10 
#CompleteWait=0 
#EpilogMsgTime=2000 
#GetEnvTimeout=2 
#HealthCheckInterval=0 
#HealthCheckProgram= 
InactiveLimit=0
KillWait=30
#MessageTimeout=10 
#ResvOverRun=0 
MinJobAge=300
#OverTimeLimit=0 
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60 
#VSizeFactor=0 
Waittime=0
# 
# 
# SCHEDULING 
#DefMemPerCPU=0 
FastSchedule=1
#MaxMemPerCPU=0 
#SchedulerTimeSlice=30 
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core
# 
# 
# JOB PRIORITY 
#PriorityFlags= 
#PriorityType=priority/basic 
#PriorityDecayHalfLife= 
#PriorityCalcPeriod= 
#PriorityFavorSmall= 
#PriorityMaxAge= 
#PriorityUsageResetPeriod= 
#PriorityWeightAge= 
#PriorityWeightFairshare= 
#PriorityWeightJobSize= 
#PriorityWeightPartition= 
#PriorityWeightQOS= 
# 
# 
# LOGGING AND ACCOUNTING 
#AccountingStorageEnforce=0 
#AccountingStorageHost=
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/slurmdbd
#AccountingStorageUser=
AccountingStoreJobComment=YES
ClusterName=cluster
#DebugFlags= 
#JobCompHost=
#JobCompLoc=
#JobCompPass=
#JobCompPort=
JobCompLoc=/var/log/slurm/slurm.job.log
JobCompType=jobcomp/filetxt
#JobCompUser=
#JobContainerType=job_container/none 
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=3
#SlurmdLogFile=
#SlurmSchedLogFile= 
#SlurmSchedLogLevel= 
# 
# 
# POWER SAVE SUPPORT FOR IDLE NODES (optional) 
#SuspendProgram= 
#ResumeProgram= 
#SuspendTimeout= 
#ResumeTimeout= 
#ResumeRate= 
#SuspendExcNodes= 
#SuspendExcParts= 
#SuspendRate= 
#SuspendTime= 
# 
# 
# COMPUTE NODES 
NodeName=compute[01-04] Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN 
PartitionName=prod Nodes=compute[01-04] Default=YES MaxTime=INFINITE State=UP

We don't have any real compute nodes yet, so lets add our head node as a compute node just to test that our configuration actually works. At the end of the configuration file, add another NodeName line that looks like this:

NodeName=baker Sockets=1 CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN

That should be enough to get us going. Save that into /apps/slurm-19.05/etc/slurm.conf. Let's create a user named slurm to run the job scheduler, and then create /var/spool/slurm and assign ownership to the slurm user.

baker:~ # useradd -m slurm
baker:~ # mkdir /var/spool/slurm/
baker:~ # chown slurm:root /var/spool/slurm/
baker:~ # chmod 0700 /var/spool/slurm
baker:~ #

Before we try to start the scheduler, let's edit the systemd service configuration file to change the location of the PID files, as we did for slurmdbd. Edit /usr/lib/systemd/system/slurmctld.service and change the PIDfile location to /var/tmp/slurmctld.pid. We'll revisit these PID files later to see why the default doesnt work and how we can fix it.

We should be close to a working state now. Let's give it a try and see if the scheduler starts up.

baker:~ # systemctl start slurmctld
baker:~ # ps -ef | grep slur
slurm    31584     1  0 00:59 ?        00:00:00 /apps/slurm-19.05/sbin/slurmdbd
slurm    32094     1  5 02:08 ?        00:00:00 /apps/slurm-19.05/sbin/slurmctld
root     32109  5856  0 02:08 pts/0    00:00:00 grep --color=auto slur
baker:~ #

Things at least look good. Let's see if e can run some common commands.

admin@baker:~> sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
prod*        up   infinite      4  down* compute[01-04]
prod*        up   infinite      1  down* baker
admin@baker:~>

That's promising. All of our nodes are currently down. Compute nodes 1 through 4 are down because they don't exist, and the management node (baker) is down because we haven't started slurmd on it yet. Again, it's not common to run jobs on your management node, but just for testing purposes, let's start slurmd and see if we can run a simple job.

baker:/var/log/slurm # systemctl start slurmd
baker:/var/log/slurm # srun /bin/hostname
baker
baker:/var/log/slurm #

Everything looks good. We used the srun command to run a quick job to check the host name, and we can see it's running on node baker, as expected. Let's set all our required processes to start on reboot, and reboot just to make sure everything comes up as expected:

baker:~ # systemctl enable munge
Created symlink /etc/systemd/system/multi-user.target.wants/munge.service → /usr/lib/systemd/system/munge.service.
baker:~ # systemctl enable mysql
Created symlink /etc/systemd/system/mysql.service → /usr/lib/systemd/system/mariadb.service.
Created symlink /etc/systemd/system/multi-user.target.wants/mariadb.service → /usr/lib/systemd/system/mariadb.service.
baker:~ # systemctl enable slurmdbd
Created symlink /etc/systemd/system/multi-user.target.wants/slurmdbd.service → /usr/lib/systemd/system/slurmdbd.service.
baker:~ # systemctl enable slurmctld
Created symlink /etc/systemd/system/multi-user.target.wants/slurmctld.service → /usr/lib/systemd/system/slurmctld.service.
baker:~ # reboot

baker:~ # ps -ef | egrep 'slurm|munge'
munge     1242     1  0 03:30 ?        00:00:00 /apps/munge-0.5.13/sbin/munged
slurm     1250     1  0 03:30 ?        00:00:00 /apps/slurm-19.05/sbin/slurmdbd
slurm     1252     1  0 03:30 ?        00:00:00 /apps/slurm-19.05/sbin/slurmctld
root      1476  1438  0 03:32 pts/0    00:00:00 grep -E --color=auto slurm|munge
baker:~ # sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
prod*        up   infinite      1  down* baker
baker:~ #

It's a little odd that our compute nodes have disappeared from the list, but let's not worry about it for now. We'll check again once e have the nodes set up. Let's get pdsh set up to help with configuring the compute nodes.

pdsh

pdsh is a Parallel Distributed SHell that will run the same command on multiple nodes in parallel. You can write a simple for loop if you only need to work with a few nodes, but this doesn't scale well to large numbers of nodes. pdsh will not only run commands in parallel, but will also collect the output and collate it into an easily readable report. We'll see when we start running some examples.

Let's start with the modulefile for pdsh. Starting with the same template we've been using, our modulefile should look like this:

#%Module1.0
## pdsh modulefile
##
proc ModulesHelp { } {
    puts stderr "This module sets up the user environment for pdsh"
}

module-whatis "Configures a user environment to use pdsh"
set curMod [module-info name]
prepend-path PATH /apps/pdsh-2.33/bin
prepend-path LIBPATH /apps/pdsh-2.33/lib64
prepend-path MANPATH /apps/pdsh-2.33/share/man

Save that as /apps/modules-4.2.0/picluster/pdsh/pdsh-2.33 and save the following file as /apps/modules-4.2.0/picluster/pdsh/pdsh-2.33/.version:

#%Module1.0
##
set ModulesVersion "pdsh-2.33"

Now you should be able to add the module:

admin@baker:/apps/modules-4.2.0/picluster> cd
admin@baker:~> module add pdsh
admin@baker:~> pdsh -a date
pdsh@baker: /etc/dsh/nodes: No such file or directory
admin@baker:~>

It seems that pdsh is working, but we asked it to run the date command on every node it knows about, and we got an error. It seems pdsh doesn't know about any nodes. When we built pdsh, we told it where to find the list of nodes in the configuration step (--with-machines=/etc/dsh/nodes). Let's put our compute nodes into that file and try again.

baker:~ # cat > /etc/dsh/nodes
compute01
compute02
compute03
compute04
baker:~ #

The nodes listed in /etc/dsh/nodes are the default list of nodes used when I ask for all nodes with the -a flag. Since our compute nodes still don't exist, we're not going to get any useful output, but we should at least get a different error.

admin@baker:~> pdsh -a date
compute03: ssh: connect to host compute03 port 22: No route to host
compute02: ssh: connect to host compute02 port 22: No route to host
compute01: ssh: connect to host compute01 port 22: No route to host
compute04: ssh: connect to host compute04 port 22: No route to host
pdsh@baker: compute02: ssh exited with exit code 255
pdsh@baker: compute03: ssh exited with exit code 255
pdsh@baker: compute01: ssh exited with exit code 255
pdsh@baker: compute04: ssh exited with exit code 255
admin@baker:~>

That seems more reasonable. Let's build our compute nodes so we can do some real testing.