Linux Cluster (caucluster)

no/neinStudents yes/jaEmployees yes/jaFaculties no/neinStud. Unions

caucluster

The caucluster is an x86 based Linux cluster that is particularly suitable for serial, moderately parallel and high memory computation. It features the following hard- and software structure:

  • CentOS operating system,
  • BeeGFS as cluster wide parallel work file system,
  • Slurm batch system with fair-share scheduling,
  • module environment for software deployment.

 


 

Status

Login nodes and file systems

Current status of the two login nodes and of the home and work file systems:

 

Cluster utilization

The following figures show the raw usage of the individual fair-share accounts as well as the core utilization over the last seven days:

Currently, there exist eight accounts (labeled '#1' to '#8' in the figure) on top of the default CAU account (labeled as '#cau').

Remarks:

  • Note, that the fair-share raw usage will decay over time with an exponential decay law.
  • Note, that cores may idle for various reasons, e.g., due to already existing reservations for queued jobs or due to running jobs that require only few cores per node but a lot of main memory.
  • Accounts: #1: Bauer, #2: Bonitz, #3: Dagan, #4: Heber, #5: Heinze, #6: Kaleta, #7: Pehlke, #8: Srivastav.

 

Hardware

 

Login nodes (front ends)

  • 2x INTEL Sandy Bridge, 16 cores (2.6GHz), 128GB main memory
  • Generic login: caucluster.rz.uni-kiel.de

 

Compute nodes (batch nodes)

  • 4x INTEL Sandy Bridge, 16 cores (2.6GHz), 256GB main memory; available
  • 36x INTEL Sandy Bridge, 16 cores (2.6GHz), 128GB main memory; available
  • 12x INTEL Sandy Bridge, 16 cores (2.6GHz), 128GB main memory (previously: "focean"); available
  • 4x INTEL Ivy Bridge, 16 cores (2.6GHz), 64GB main memory (previously: "angus"); available
  • 2x INTEL Ivy Bridge, 32 cores (2.6GHz), 1024GB main memory (previously: "fobigmem"); available
  • 7x INTEL Haswell, 40 cores (1.9GHz), 256GB main memory (previously: "fairq" [old caucluster]); partially available (6x)
  • 4x INTEL Haswell, 32 cores (2.3GHz), 256GB main memory (previously: "msb"); available
  • 3x INTEL Haswell, 16 cores (2.4GHz), 128GB main memory (previously: "angus"); available
  • 5x INTEL Haswell, 16 cores (2.4GHz), 256GB main memory (previously: "angus"); available
  • 1x INTEL Broadwell, 24 cores (2.2GHz), 1024GB main memory (previously: "kaleta1TB"); available
  • 1x INTEL Broadwell, 24 cores (2.2GHz), 1024GB main memory (previously: "genomik1"); available
  • 4x AMD Epyc (Naples), 32 cores (2.4GHz), 256GB main memory; available
  • 3x AMD Epyc (Rome), 64 cores (2.35GHz), 512GB main memory; available

 

File systems

  • $HOME file system: 22TB, user quota, /home/<username>
  • $WORK file system: 350TB BeeGFS storage, user quota, /work_beegfs/<username>

 

Access

 

User account

To apply for an account on the caucluster, please, fill in and sign the form No. 3 called "Request for use of a High-Performance Computer" and send it to the Computing Centre's user administration office. The form and further information is available here.

System access

From within the campus network of Kiel University, the login nodes of the caucluster are accessible via SSH:

ssh -X <username>@caucluster.rz.uni-kiel.de
  • The additional option -X activates X11 forwarding, which is required, e.g., for GUI applications.
  • Suitable SHH clients for Windows PCs are MobaXterm, X-Win32 or Putty. MobaXterm and WinSCP can be used for data transfer.
  • From outside the campus network (e.g., from home) you first need to establish a VPN connection to this network using the same username. To apply for the VPN dial-up access of the Computing Centre, please, fill in and hand in the form No. 1, see here for details.

 

File systems

 

The caucluster offers different file systems to handle data. Each of these file systems has its own purpose and features. Therefore, please, follow the guidelines below!

Quota information for the home and work file systems can be displayed with the command

workquota

 

Home file system

  • Contains the user's home directory, which is also accessible via the environment variable $HOME.
  • There will be a regular backup (typically daily) of your data in $HOME.
  • Disk space is limited by user quotas. Default quotas: 100GB (soft) and 150GB (hard).
  • Suited for software, programs, code, scripts and a small amount of results which necessarily needs a backup.
  • The home directory should not be used for batch calculations. Thus, please change to your work directory first!

 

Work file system

  • Contains the user's work directory, which is also accessible via the environment variable $WORK.
  • There exists no backup of your data in $WORK. Thus, please, be careful!
  • Disk space and chunk files (1 million chunks ~ 250000 files) are limited by user quotas. Default disk space (chunk files) quotas: 1TB (250000).
  • Parallel BeeGFS file system. More details are collected on a separate page (please read!).
  • Should be used for batch calculations!

 

Lokal disk space

  • Disk space that is directly attached to a compute node.
  • The size of the local disk depends on the respective node. Please, see command sinfo in section 'batch processing'.
  • Accessible within a batch job via the environment variable $TMPDIR.
  • Suited for I/O-intensive computations due to fast access times.
  • Attention: Local disk space is only available during a running batch job, i.e., all data on the local disk will automatically be removed after job termination.

 

Tape library

  • Disk space with automatic file relocation on magnetic tapes.
  • Accessible from all login nodes via the environment variable $TAPE.
  • Suited for storing non-active data which are currently not used.
  • Not suited for storing a large number of small files.
  • Data must be compressed in form of file archives (e.g., tar files) before transmission to $TAPE. Recommended file sizes 3-50GB. File sizes must not exceed 1TB due to the limited tape capacity.
  • Data transfer to and from the tape library must not be performed with the rsync command!
  • Attention: Do not work directly with your files on the tape library directory! Instead, copy the desired file(s) back to the work directory before further processing. Note that this also includes the step of unpacking file archives.

 

External file servers

To access (project data) storage on the Samba/Windows file servers of the Computing Centre, you can use the command smbclient which is available on the login nodes of the caucluster. Here are some examples:

# display contents of a specific directory
smbclient //<file-server>/<path> -c 'cd <directory>; ls' -U uni-kiel/<username>

# put a file called 'test.dat' onto the file server
smbclient //<file-server>/<path> -c 'cd <directory>; put test.dat' -U uni-kiel/<username>

# get a file called 'test.dat' from the file server
smbclient //<file-server>/<path> -c 'cd <directory>; get test.dat' -U uni-kiel/<username>
For more detailed information see, e.g., the command man smbclient.

 

Software

 

Operating system

  • CentOS 7.x
  • Bash as the supported default Unix shell.

 

Module environment

User software, including compilers and libraries, are mainly deployed via lmod environment modules, which perform all actions necessary to activate the respective software package. To list all available environment modules, use the command

module avail

To activate and deactivate a software package, there exist load and unload sub-commands. To use the GCC compiler version 9.2.0, for example, the sequence of commands would be

module load gcc/9.2.0
gcc ...
module unload gcc/9.2.0
# or to deactivate all loaded modulefiles
module purge

To list all loaded modulefiles, for more information about a specific package and a complete list of module commands, see the list, show and help sub-commands:

module list
module show gcc/9.2.0
module help
In case you want to use Matlab, Python, R or Perl, please, also read the software specific informations in the sections below.
Note that, if not otherwise specified, all software packages available via modulefiles have been built with the GCC compiler version 9.2.0.
Modulefiles with ending "-intel", such as "boost-intel/1.70.0," have been built with the INTEL compiler version 18.0.4.

 

Compilers

To compile source code, there exist on top of the system's default compiler (which is GCC version 4.8.5) additional compilers, see the compilers section displayed by the module avail command. Examples:

GCC compilers:

module load gcc/9.2.0
gfortran ...
gcc ...
g++ ...

Intel compilers:

module load intel/18.0.4
ifort ...
icc ...
icpc ...

Intel MPI compilers:

module load intel/18.0.4 intelmpi/18.0.4
mpiifort ...
mpiicc ...
mpiicpc ...

 

The Cambridge Structural Database (CSD)

The most recent version of the CSD software is currently purchased with a node-locked license, which means that we can deploy the software on one of the two login nodes only. To use the CSD commands like mercury or cq you therefore have to explicitly login onto caucluster2.rz.uni-kiel.de.

Here is an example:

# ssh-login: ssh -X <username>@caucluster2.rz.uni-kiel.de
module load csd/2020
mercury
cq

 

Matlab

In order to use Matlab within a batch job, you need to use the Matlab Compiler to first generate a binary from your *.m file. This binary can then directly be called in your batch script after loading the corresponding matlab module, e.g., by module load matlab/2018b. The great advantage of this procedure is that the binary does not depend on the availability of any matlab license, and, moreover, also special matlab toolboxes can easily be integrated.

Here is an example:

module load matlab/2018b
mcc -N -v -R -singleCompThread -m test.m -I /home/sw/matlab/matlab2018b/usr/toolbox/bioinfo/bioinfo

The above lines take a matlab file called test.m, which explicitly depends here on the bioinformatics toolbox, and generates a binary called test, which in turn can be executed as

module load matlab/2018b
./test
If you have problems using the Matlab Compiler, please do not hesitate to contact us via hpcsupport@rz.uni-kiel.de. As alternative to the Matlab Compiler you may also try to use Octave (module load octave/4.4.1) to run your *.m file.
Particularly in batch calculations, please, adapt the number of compute threads to the number of requested cores, e.g., by setting maxNumCompThreads(N) in your .m file.

When executing compiled Matlab code, Matlab creates a hidden cache containing binaries, links, scripts, etc. By default this cache is located in the user's home directory (i.e., ~/.mcrCache<version>/), and the performance of the compiled code can thus be poor. To increase the performance during a batch calculation, please, relocate the cache to the local disk by setting

export MCR_CACHE_ROOT=$TMPDIR

 

Python

The global Python installations available as software modules, e.g., via module load python/3.7.4, contain only the base installations and therefore do not include any additional Python packages. Moreover, we explicitly refrain from installing extra packages globally on demand in order to avoid package conflicts. The inclusion of additional packages, such as numpy, scipy, or tensorflow, is however easily possible by using the concept of virtual environments.

Here are the main steps for working with virtual environments:

A. Python version 2.x

1. Creating a virtual environment

To create a virtual environment called my_env, decide upon a directory where you want to place it in your $HOME directory (e.g., /home/<username>/my_python2_env), and run the following commands:

module load python/2.7.16
python -m ensurepip --user
python -m pip install virtualenv --user
mkdir /home/<username>/my_python2_env
python -m virtualenv /home/<username>/my_python2_env/my_env

2. Installing a package into a virtual environment

To install, upgrade or remove a Python package, activate the virtual environment and use the module called pip. As example, let us install numpy:

module load python/2.7.16
source /home/<username>/my_python2_env/my_env/bin/activate
module load gcc/9.2.0
python -m pip install numpy
deactivate

As compiler we suggest to use the GCC compiler version 9.2.0 by loading the corresponding module (module load gcc/9.2.0) prior to any call of python -m pip. Moreover, note the deactivate command at the end, which removes any settings performed by the activation source command.

3. Using the installed package

To use any package that has been installed into a virtual environment called my_env do

module load python/2.7.16
source /home/<username>/my_python2_env/my_env/bin/activate

...

deactivate

where in between the source and the deactivate command the package will be usable.

B. Python version 3.x

1. Creating a virtual environment

To create a virtual environment called my_env, decide upon a directory where you want to place it in your $HOME directory (e.g., /home/<username>/my_python3_env), and run the following commands:

module load python/3.7.4
mkdir /home/<username>/my_python3_env
python3 -m venv /home/<username>/my_python3_env/my_env

2. Installing a package into a virtual environment

To install, upgrade or remove a Python package, activate the virtual environment and use the program called pip3. As example, let us install numpy:

module load python/3.7.4
source /home/<username>/my_python3_env/my_env/bin/activate
module load gcc/9.2.0
pip3 install numpy
deactivate

As compiler we suggest to use the GCC compiler version 9.2.0 by loading the corresponding module (module load gcc/9.2.0) prior to any call of pip3. Moreover, note the deactivate command at the end, which removes any settings performed by the activation source command.

3. Using the installed package

To use any package that has been installed into a virtual environment called my_env do

module load python/3.7.4
source /home/<username>/my_python3_env/my_env/bin/activate

...

deactivate

where in between the source and the deactivate command the package will be usable.

 

R

The global R installations available as software modules are only base installations containing the standard packages, and we explicitly refrain from installing addtional packages globally on demand in order to avoid package conflicts. Additional packages can easily be installed locally by each user with the install.packages() function in R.

In order to install an additional package please carry out the following steps:

1. Create a new directory in your home directory into which you would like to install the R packages, e.g., R_libs, and include this new directory into the $R_LIBS environment path variable

mkdir R_libs 

export R_LIBS=$HOME/R_libs:$R_LIBS

2. Load the R and the gcc/9.2.0 software module and install the needed packages within R, with the install.packages() function (e.g., the lattice package)

module load R/3.6.1 gcc/9.2.0
R 
> install.packages("lattice",lib="~/R_libs")
--- Bitte einen CRAN Spiegel für diese Sitzung auswählen ---
Secure CRAN mirrors
...
31: Germany (Göttingen) [https]      32: Germany (Münster) [https]
33: Germany (Regensburg) [https]     34: Greece [https]

Auswahl: 31

3. To load a locally installed R packages, use the library command with parameter lib.loc as:

> library("lattice",lib.loc="~/R_libs")

 

 

Perl

To avoid Perl module conflicts, Perl should be installed locally by each individual user using the management tool Perlbrew. To get Perlbrew and to install a specific version of Perl, please follow this example:

curl -L https://install.perlbrew.pl | bash
source ~/perl5/perlbrew/etc/bashrc
perlbrew install --notest perl-5.30.2

In order to list all locally available Perl versions and to activate and use the installation, use the commands

source ~/perl5/perlbrew/etc/bashrc
perlbrew list
perlbrew switch perl-5.30.2

Finally, to install additional Perl modules please follow these lines:

# install cpanm
source ~/perl5/perlbrew/etc/bashrc
perlbrew install-cpanm

# use cpanm to install required module
cpanm Math::FFT

# check if module can be found (should produce no error message)
perl -e "use Math::FFT"

 

Batch processing

 

Fair-share

On the caucluster, the batch system (Slurm) does not assign the job priority on a First In - First Out (FIFO) basis. Instead a multi-factor fair-share algorithm schedules user jobs based on the portion of the computing resources (= allocated cores*seconds) they have been allocated and resources that have already been consumed. This guarantees a fair distribution of the overall available compute resources among all users.

Important is that the fair-share algorithm does not involve a fixed allotment, whereby a user's access to the compute resources is cut off completely once that allotment is reached. Instead, queued jobs are prioritized such that under-serviced user accounts are scheduled first, while over-serviced user accounts are schelduled later (when the system would otherwise go idle). Moreover, the user's consumed resources decay over time with an exponential decay law.

To list your own current (raw) usage and the resulting (raw) shares, use the command

# note the capital "U"
sshare -U

 

Performing batch calculations

To run a batch calculation it is not only important to instruct the batch system which program to execute, but also to specify the required resources, such as number of nodes, number of cores per node, main memory or computation time. These resource requests are written together with the program call into a so-called batch or job script, which is submitted to the batch system with the command

sbatch <jobscript>
Note, that every job script starts with the directive #!/bin/bash on the first line. The subsequent lines contain the directive #SBATCH, followed by a specific resource request or some other job information, see the next section for a parameter overview and the template section for examples. At the end of the job script, one finally loads the modulefiles (if required) and thereafter specifies the program call.
The default Slurm partition is called 'all' and allows for batch jobs with a maximum walltime of 48 hours.

After job submission the batch server evaluates the job script, searches for free, appropriate compute resources and, when able, executes the actual computation or queues the job.

Successfully submitted jobs are managed by the batch system and can be displayed with the following commands

squeue

# or
squeue -u <username>

# or for showing individual job details
squeue -j <jobid>
scontrol show job <jobid>

To terminate a running or to remove a queued job from the batch server use

scancel <jobid>

Further batch system commands:

# gather resource informations of a running job
sstat -j <jobid>.batch

# general partition information
sinfo

# show node list incl. available cpus, memory, features, local disk space
sinfo --Node -o "%20N  %20c  %20m  %20f  %20d"

Special requests (e.g., longer walltimes) can be defined by setting a specific quality of service:

# gather information about available quality of services
sacctmgr show qos
The default quality of service is 'normal'.
Note, that the quality of service named 'special' is available only on request and that the quality of service named 'internal' is for system admins only.

 

Job information

To obtain a summary of resources that have been consumed by your batch calculation (inter alia: jobid, node list, elapse time and maximum main memory), you can include the following line at the very end of your batch script:

jobinfo

# the option '-s' creates (on top of the standard behavior) also another file 
# with ending .sacct that includes accounting data for all performed slurm steps
jobinfo -s
Note, that the information is written to the stdout path unless directly piped to a file.
Moreover, the jobinfo tool will not work outside a batch job and is still under development, thus subject to change. If you encounter problems, please, report to hpcsupport@rz.uni-kiel.de.

 

Job reason codes

A batch job may be waiting for more than one reason. The following (incomplete) list of codes identify the reason why a job is waiting for execution:

  • Priority: One or more higher priority jobs exist for this partition or advanced reservation.
  • AssocGrpCPURunMinutesLimit: We limit how much you can have running at once.
  • QOSMaxCpuPerUserLimit: The job is not allowed to start because your currently running jobs consume all allowed cores for your username.

 

For a complete list of job reason codes, see here.

 

Batch parameters

The following table summarizes the most important job parameters. For more information, see here.

Parameter Explanation
#SBATCH Slurm batch script directive
--partition=<name> or -p <name> Slurm partition (~batch class)
--job-name=<name> or -J <jobname> Job name
--output=<filename> or -o <filename> Stdout file
--error=<filename> or -e <filename> Stderr file; if not specified, stderr is redirected to stdout file
--nodes=<nnodes> or -N <nnodes> Number of nodes
--ntasks-per-node=<ntasks> Number of  tasks per node; number of MPI processes per node
--cpus-per-task=<ncpus> or -c <ncpus> Number of cores per task or process
--mem=<size[units]> Real memory required per node; default unit is megabytes (M); use G for gigabytes
--time=<time> or -t <time> Walltime in the format "hours:minutes:seconds"
--no-requeue Never requeue the job
--constraint=<feature> or -C <feature> Request a special node feature, see command sinfo above for available features
--qos=<qos-name> or -q <qos-name> Define a quality of service
--mail-user=<email-address> Set email address for notifications
--mail-type=<type> Type of email notification: BEGIN, END, FAIL or ALL

Some environment variables that are defined by Slurm and are usable inside a job script:

Variable Explanation
$SLURM_JOBID Contains the job ID
$SLURM_NODELIST Contains the list of nodes on which the job is running
$SLURM_JOB_USER Contains the username
$SLURM_JOB_NAME Contains the name of the job as specified by the parameter --job-name

Note, that during job submission in Slurm all environment variables that have been defined in the shell (so far), will automatically be transferred to the batch job. As this can lead to undesired job behaviors, it is advisable to disable this feature with the following job parameter:

#SBATCH --export=NONE

 

Special batch jobs

To run batch calculations with interactive access, Slurm provides two options:

1. Interactive jobs without X11 support

srun --pty --nodes=1 --cpus-per-task=1 --mem=1000 --time=00:10:00 /bin/bash
  • This command gives access to a remote shell on a compute node of the specified partition with the given resources. Type exit to close the interactive session.

2. Interactive jobs with X11 support

srun --x11 --pty --nodes=1 --cpus-per-task=1 --mem=1000 --time=00:10:00 /bin/bash
  • Type exit to close the interactive session.

 

Batch script templates

In this section, you will find templates for perfoming typical types of batch jobs.

1. Running a serial calculation

#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1000
#SBATCH --time=01:00:00
#SBATCH --output=test.out
#SBATCH --error=test.err
#SBATCH --partition=all

export OMP_NUM_THREADS=1
./test.x

2. Running an shared-memory OpenMP-parallel calculation

#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=1000
#SBATCH --time=01:00:00
#SBATCH --output=test.out
#SBATCH --error=test.err
#SBATCH --partition=all

export OMP_NUM_THREADS=8
./test.x

3. Running a multi-node Intel-MPI-parallel calculation

#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=2
#SBATCH --tasks-per-node=16
#SBATCH --cpus-per-task=1
#SBATCH --mem=64000
#SBATCH --time=01:00:00
#SBATCH --output=test.out
#SBATCH --error=test.err
#SBATCH --partition=all

export OMP_NUM_THREADS=1
module load intel/18.0.4 intelmpi/18.0.4
mpirun -np 32 ./test.x

4. Running a multi-node hybrid OpenMP+MPI calculation

#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=2
#SBATCH --tasks-per-node=8
#SBATCH --cpus-per-task=2
#SBATCH --mem=64000
#SBATCH --time=01:00:00
#SBATCH --output=test.out
#SBATCH --error=test.err
#SBATCH --partition=all

export OMP_NUM_THREADS=2
module load intel/18.0.4 intelmpi/18.0.4
mpirun -np 16 ./test.x

5. Running a job array:

#!/bin/bash
#SBATCH --job-name=test
#SBATCH --array=0-9
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1000
#SBATCH --time=01:00:00
#SBATCH --output=test-%A_%a.out
#SBATCH --error=test-%A_%a.err
#SBATCH --partition=all

export OMP_NUM_THREADS=1
echo "Hi, I am task $SLURM_ARRAY_TASK_ID in the job array $SLURM_ARRAY_JOB_ID"

 


Support and Consulting

HPC Support Team: hpcsupport@rz.uni-kiel.de
Responsible contact persons at the Computing Centre:
Please see HPC Support and Consulting.