Linux-Cluster (caucluster)

no/neinStudents yes/jaEmployees yes/jaFaculties no/neinStud. Unions

caucluster

The caucluster is a Linux cluster which is currently separated from the rzcluster and has been established by several institutions of Kiel University and the Computing Centre. In the future, it is supposed to form the nucleus of a new compute cluster for the whole University. At the moment the Computing Centre examines here the batch system Slurm in combination with a fair share operating model, which is supposed to assign user groups amounts of resources that are in accordance with their share of financing. The system consists of 7 compute nodes with a total of 280 cores and 256GB main memory each.


 

Informations on the new BeeGFS file system are collected here .

 

Hardware

1 front end (login node)

  • 8 cores (8-core Intel Xeon E5-2640 @ 2.60GHz)
  • 64GB main memory
  • Node name: rzcl00bn

7 nodes (batch nodes)

  • 40 cores (4 10-core Intel Xeon E7-4820 @ 1.90GHz)
  • 256GB main memory
  • Node names: cau001, ..., cau007
  • Interconnection: infiniband
  • 3.7TB local disk space
  • Extendable with 4 accelerator cards (Intel Xeon Phi)

Operating system

  • CentOS Linux 7.x

 

Access

User account

  • To apply for a caucluster account please filled in and sign the request for use of a High-Performance Computer (please mark Linux-Cluster (rzcluster) with a cross) and send the form to the University Computer Centre.

  • IMPORTANT: Using permission is granted until February 15th of the following year. If permission isn’t extended, all data stored on disk storage or tape will be deleted after 6 months. You or your manager will be sent an extension form for another year in a timely fashion.

System access

  • Computations on the caucluster should primarily be performed in batch mode. Only short interactive operations (e.g. compilation, test of scripts or programs) should be executed on the front end node.
  • If your computations require longer interactive computation time and/or a lot of main memory, please contact us in advance.
  • Access to the front end caucluster.rz.uni-kiel.de is possible via a SSH-connection established within the internal network of Kiel University:

    ssh -X <username>@caucluster.rz.uni-kiel.de

  • Note, that the preceding $-sign represents the command line prompt and is not part of the input!
  • With the additional option -X (uppercase X) one activates X11-forwarding.
  • Suitable SHH-clients for a Windows-PC are Putty, X-Win32 or MobaXterm. WinSCP and MobaXterm can be used to transfer data.

 

File systems

  • All users have access to different file systems, each of which are only suitable for certain tasks!

Home directory

  • Global user directory, which is available on the login node and all batch nodes
  • Accessible via the environment variable $HOME
  • Daily data backup
  • Suitable for developing programs and scripts as well as for saving small amounts of important results
  • Not suitable for batch computations!

Work directory

  • Global user directory, which is available on the login node and all batch nodes
  • For new users a work directory is created automatically on the parallel BeeGFS filesystem /work_beegfs.
  • Accessible via the environment variable $WORK
  • No data backup
  • Suitable for performing batch computations

Local disk space

  • Via the environment variable $TMPDIR each batch node provides additional local disk space
  • Very I/O-intensive computations should always be performed using this directory.
  • Attention: The local disk space is only available during a running batch job, i.e., all data on the local disk will be removed automatically after job termination.

Tape library

  • Files which are currently not used should be transferred as archived data (for example as a tar-file) to the additional file system /nfs/tape_cache.
  • Files under /nfs/tape_cache will be stored automatically on tape after a while. Nevertheless, it is possible, at any time, to copy the data back to the home or work directory.
  • Not suitable for the storage of a lot of small files
  • Recommended size of an archived file: 3GB to 50 GB (capacity of a magnetic tape: 750 GB)
  • Data transfer to and from the tape library must not be performed with the rsync command

Additional disk space

  • Under the path /bose currently further global disk space can be used (2.9TB in total).

 

Software

Compiler

  • For compiling serial and parallel programs the system provides several compilers:
  • GNU-compiler: gcc, g++, ...
  • Intel-compiler: ifort, icc, icpc, mpiifort, mpiicc, mpiicpc, ...
  • Portland-compiler (PGI): pgf90, pgcc, pgc++, ...
  • For performing multi-node MPI computations, the involved batch nodes have to be able to communicate without any password request. For achieving this, each user has to perform once the following steps on the login node:
    1. Create a key pair with the following command: ssh-keygen -t rsa Please confirm occurring queries only with Return. 
    2. Copy the file $HOME/.ssh/id_rsa.pub into the file $HOME/.ssh/authorized_keys.

Libraries

  • Standard libraries: hdf5, netcdf, gsl, boost, eigen, fftw, mkl, ...

User software

  • Standard software: Matlab, Python, R, gnuplot, ...

Module concept

  • Compilers, libraries, software and specific tools are provided on the caucluster via a system-wide module concept.
  • An overview of the installed programs can be obtained via the command:

    module avail

  • Further commands for software usage:
    Command Explanation
    module load <name>

     

    Loads the module <name>, i.e., performs all settings which are required for using the program
    module unload <name> Removes the module, i.e., resets all settings
    module list Lists all modules which are currently loaded
    module show <name> Displays the settings which are performed by the module

 

Batch processing

Introduction

  • The principle of batch processing:
    batch processing

 

Fair share

  • On the caucluster we deploy the batch system Slurm as workload manager and distribute computation time in accordance with a fair share principle.
  • The used computation time (on which the job priority is calculated) is the product of the used walltime and the number of requested cores. The used main memory is not (yet) considered in the fair share accounting.
  • IMPORTANT! For performing batch computations, the user has to be added manually to the Slurm accounting database. In case your account has not been activated yet, please contact hpcsupport@rz.uni-kiel.de.
  • An overview of the used resources based on the individual user or the group can be obtained by entering command:  sshare -a.

 

Slurm

  • The most important Slurm commands for executing batch jobs are the following.
  • Submission of a batch job: sbatch <jobscript>
  • List all jobs on the system:
  • squeue
  • List only the own jobs:
  • squeue -u <userid>
  • Delete or terminate a batch job: scancel <jobid>
  • Show details of a batch job: scontrol show job <jobid>
  • Job parameters:
    Parameter Explanation
    -J, --job-name

     

    Job name

    -o, --output Stdout file
    -e, --error Stderr file
    -N, --nodes Number of nodes
    --tasks-per-node Number of (MPI-) tasks per node
    -c, --cpus-per-task Number of cores per (MPI-) task
    --mem

    Required main memory (in MB per node)

    -t, --time Time limit (walltime), in minutes or hh:mm:ss
    -p, --partition Queue/partition
  • If there is an abbreviation for the job parameter, it can be set with #SBATCH -J test or #SBATCH --job-name=test.

 

Partitions/Queues

  • The standard partition is called fairq, which calculates the job priority based on a fair share algorithm. The maximum walltime of jobs in this partition is 100 hours.
  • In addition, there is a partition called testq (with higher base priority), in which jobs with a maximum walltime of 30 minutes and not more than 10 cores per node can be performed.
  • The partitions lowq, internq etc. are reserved for the internal use of the Computing Centre.

 

Templates

  • Example script for a serial computation:
    #!/bin/bash
    #SBATCH --job-name=test
    #SBATCH --output=test.out
    #SBATCH --error=test.err
    #SBATCH --nodes=1
    #SBATCH --tasks-per-node=1
    #SBATCH --cpus-per-task=1
    #SBATCH --memory=1000
    #SBATCH --time=01:00:00
    #SBATCH --partition=fairq
    
    export OMP_NUM_THREADS=1
    ./test.ex
  • Example script for a parallel multi-node MPI computation:
    #!/bin/bash
    #SBATCH --job-name=test
    #SBATCH --output=test.out
    #SBATCH --error=test.err
    #SBATCH --nodes=2
    #SBATCH --tasks-per-node=40
    #SBATCH --cpus-per-task=1
    #SBATCH --mem=10000
    #SBATCH --time=01:00:00
    #SBATCH --partition=fairq
    
    export OMP_NUM_THREADS=1
    module load intelmpi16.01
    mpirun -np 80 ./test.ex
    
  • Example script for a multi-node hybrid computation:
    #!/bin/bash
    #SBATCH --job-name=test
    #SBATCH --output=test.out
    #SBATCH --error=test.err
    #SBATCH --nodes=2
    #SBATCH --tasks-per-node=4
    #SBATCH --cpus-per-task=10
    #SBATCH --mem=10000
    #SBATCH --time=01:00:00
    #SBATCH --partition=fairq
    
    export OMP_NUM_THREADS=10
    module load intelmpi16.01
    mpirun -np 8 ./test.ex

 

  • After requesting the required resources, it is also possible to work interactively on the batch node(s). At the moment, there are two possibilities:
    1. Example for an interactive computation, in which the login prompt remains and commands are only executed remotely (i.e. on the compute node) with a preceding srun:

      salloc --nodes=1 --cpus-per-task=1 --time=10 --partition=testq
    2. Example for an interactive computation, in which all commands are executed on the compute node:

      srun --pty --nodes=1 --cpus-per-task=1 --time=10 --partition=fairq /bin/bash


Support and Consulting

HPC-Support-Team: hpcsupport@rz.uni-kiel.de
Responsible contact persons at the Computing Centre:
Please see HPC-Support and Consulting.