NEC HPC-Linux-Cluster

no/neinStudents yes/jaEmployees yes/jaFaculties no/neinStud. Unions

NEC HPC-Linux Cluster

The NEC HPC-Linux-Cluster is part of a hybrid NEC high performance system at the University Computing Centre. It consists of a NEC SX-ACE vector system with a theoretical peak performance of 65.6 TFlops and the scalar Linux-Cluster with in total 6192 cores and a theoretical peak performance of 404.4 TFlops. Both systems can be accessed by the same front end and have access to a shared 5 PB global file system. Batch computations are managed by an overall batch system (NQSII).


 

Hardware

Front end NEC HPC 124Rh-1

  • 4 systems each with
    • 2 Intel Xeon Gold 6130 (Skylake-SP) processors (2.1 GHz)
    • 32 cores per node and 768 GB main memory

 

NEC Linux-Cluster batch nodes

  • 172 nodes each with
    • 2 Intel Xeon Gold 6130 (Skylake-SP) processors (2.1 GHz)
    • 32 cores per node and 192 GB main memory
  • 18 nodes each with
    • 2 Intel Xeon E5-2680v3 (Haswell-EP) processors (2.5 GHz)
    • 24 cores per node and 128 GB main memory
  • Connection: EDR infiniband

 

Operating system

  • Red Hat Linux (Release 7.4)

 

Access

User account

  • For the use of our NEC HPC systems an extra validation is required. To apply for an account please fill in and sign the request for use of a High-Performance Computer (please mark NEC High-Performance Computing system with a cross) and send the form to the University Computer Centre.

System access

  • An interactive login is only possible via nesh-fe.rz.uni-kiel.de. After the login you will be automatically redirected to one of the three available front ends.
  • Access to the NEC-Linux-Cluster is only possible via a SSH-connection established within the internal network of Kiel University:

    ssh -X <username>@nesh-fe.rz.uni-kiel.de

  • Please note that the preceding $-sign represents the command line prompt and is not part of the input!
  • With the additional option -X (uppercase X) one activates X11-forwarding.
  • Suitable SHH-clients for a Windows-PC are Putty, X-Win32 or MobaXterm. MobaXterm and WinSCP can be used for data transfer.
  • Computations on the NEC-Linux-Cluster should be primarily performed in batch mode. Only short interactive operations (e.g. compilation, test of scripts or programs) should be performed on the front end. If your computations require longer interactive computation time and/or a lot of main memory, please contact us in advance.

Password change

  • To change your password please use the command

    yppasswd

Display of work quota

  • By default we display your ScaTeFS work quota upon login on the nesh-fe front ends. These quota are related to the used and available disk space and number of inodes (~number of files) on your $WORK directory.
  • Usually, used resources should be displayed in green, which means that you are below your soft and hard limits (grace period = "none", also green). Once disk space or number of inodes reaches the soft limit, used resources turn orange and at the same time a grace period is activated, which counts down from 6 to 0 days (displayed in red). Within these 6 days it is still possible to use further disk space and/or number of inodes, respectively, as long as no hard limit is exceeded. When the grace period however expires (grace period = "expired", all corresponding values turn red), no additional disk space or number of inodes can be used. The same is the case when you directly exceed your hard limits. During the grace period you have thus the chance to free disk space and/or inodes on your $WORK directory.

 

File systems

The front end and all batch nodes share a global 5 PB file system aInind Home and Work directories. In addition, there is the possibility to transfer data to the tape library.

Home directory

  • Accessible via the environment variable $HOME
  • Globally available on all nodes (front end and batch nodes)
  • Daily data backup
  • Suitable for saving scripts, programs and small results

 

Work directory

  • Accessible on the front end via the environment variable $WORK
  • Globally available on all nodes (front end and batch nodes)
  • File system without data backup
  • User quota for disk space and inodes: workquota
  • Batch computations should only be performed in this directory

 

Local disc space

  • For very I/O intensiv calculations, each batch node provides local disk space for storing temporary data. 
  • Please insert in your batch script the following line for using the local disk space (defines $SCRATCH environment variable)
    export  SCRATCH="/scratch/"`echo  $PBS_JOBID  |  cut  -f2  -d\:`
  • Please note that files stored in $SCRATCH are only available within a running batch job, i.e. all data on the local disk will be removed automatically after job termination.

 

Tape library

  • Accessible via the environment variable $TAPE_CACHE
  • Available on the login node or via the batch class feque
  • Files which are currently not used should be transferred to the additional file system /nfs/tape_cache as archived data (for example as a tar file).
  • Files under /nfs/tape_cache will be stored automatically on tape after a while. Nevertheless, it is possible to copy them back to the home or work directory via the login node at any time.
  • Not suitable for the storage of a lot of small files
  • Recommended size of an archived file: 3GB to 50 GB (max. size of one tar-file should not exceed 1TB)
  • Data transfer to and from the tape library must not be performed with the rsync command
  • Attention: Slow access speed - avoid to work directly with files on the tape library. Instead, copy files back to the work-directory before further processing (such as unpacking).

 

Software

Compiler

  • For compiling serial and parallel programs, several compilers are available on the NEC Linux-Cluster:
    • gnu compiler: gfortran, gcc and g++
    • Intel compiler: ifort, icc and icpc (available after initializing with the command
      module load intel17.0.4)

 

MPI-parallelization

  • For developing and implementing MPI-parallelized programs the NEC-Linux-Cluster provides the Intel-MPI environment.
  • Initialization of the Intel-MPI environment: $ module load intelmpi17.0.4
  • Call of the compiler:
    • mpiifort, mpiicc and mpiicppc (using Intel-Compilers)
    • mpif90, mpigcc and mpigxx (using gnu-Compilers)
  • Start of MPI-parallelized programs: $mpirun $NGSII_MPIOPTS -np 4 ./executable (example for using 4 cores)
  • For performing multi-node MPI computations, the involved batch nodes have to be able to communicate without any password request. For achieving this, each user has to perform once the following steps on the login node:
    1. Create a key pair with the following command: ssh-keygen -t rsa Please confirm occurring queries only with Return. 
    2. Copy the file $HOME/.ssh/id_rsa.pub into the file $HOME/.ssh/authorized_keys.

 

Libraries

  • netCDF, HDF5, fftw, gsl, PETSc, MKL, ...

 

Software and tools

  • CDO, Ferret, gnuplot, NCO, likwid-Tools, Matlab, Python, R, Turbomole, ...
  • Due to license reasons, there are different Matlab versions for members of Kiel University and employees of GEOMAR. If you belong to the latter, please load the corresponding module with the ending _geomar before use.

 

Module concept

  • Compiler, libraries, software and specific tools are provided via a system-wide module concept, which is also available on the batch nodes.
  • An overview of the installed programs can be obtained by entering the following command: module avail
  • Further commands for software usage:
    Command Explanation
    module load <name>

     

    Loads the module <name>, i.e., performs all settings which are required for using the program
    module unload <name> Removes the module, i.e., resets all settings
    module list Lists all modules which are currently loaded
    module show <name> Displays the settings which are performed by the module

 

Batch processing

For resource management, we deploy the batch system NQSII in combination with the scheduler "Job Manipulator".

Batch classes

Currently, the following batch classes are available on the NEC Linux-Cluster:

Batch class max. runtime (walltime) number of cores per node max. main memory per node max. number of nodes*
clexpress 2 hours 32 192 GB 2
clmedium 48 hours 32 192 GB 120
cllong 100 hours 32 192 GB 50
clbigmem 200 hours 32 384 GB 8
clfo2 200 hours 24 128 GB 18
feque 1 hour (CPU-time) 32 750 GB 1

* Due to software or hardware issues, the number of available nodes might be less than stated. A list of the currently available resources can be obtained by entering the command qcl.

 

Submitting batch jobs

For the execution of a computation in batch mode it is important, that a user not only instructs the batch systems which program to execute, but also provides information about the resources (computation time, required memory) that are required for the computation. The information about the required resources, as well as the program call have to be inserted into a script and then submitted to the batch system by using the commandqsub <nqs_script> .

Most important options for submitting batch jobs

NQSII Option Explanation
#!/bin/bash defines the shell

#PBS -T intmpi

specifies job type (Intel MPI), only necessary for parallel computations

#PBS -b 2

number of nodes (here 2)

#PBS -l cpunum_job=16

number of requested cores per nodes (max. 16 or 24)
#PBS -l elapstim_req=01:00:00 walltime (here 1h)
#PBS -l cputim_job=16:00:00 accumulated CPU-time per node (here 16*1h)
#PBS -l memsz_job=10gb main memory required per node (RAM; here 10gb)
#PBS -N test name of  batch job (here test)
#PBS -o test.out file for standard-output (here test.out)
#PBS -e test.err file for error-output (here test.err)
#PBS -q clexpress requested batch class (here clexpress)
#PBS -j o

join stdout/stderr

#PBS -m abe email notification if the job begins (b), ends (e) or aborts (a)
#PBS -M <address> email address to use for -m options

 

Example batch scripts

Example for a serial calculation:

#!/bin/bash
#PBS -b 1
#PBS -l cpunum_job=1
#PBS -l elapstim_req=01:00:00
#PBS -l cputim_job=01:00:00
#PBS -l memsz_job=10gb
#PBS -N testjob
#PBS -o stdstderr.out
#PBS -j o
#PBS -q clexpress

# Change into qsub directory
cd $PBS_O_WORKDIR

# Start the serial computation
./executablename
# Output of used resources (computation time, main memory) after the job
/usr/bin/nqsII/qstat -f ${PBS_JOBID/0:/}

 

 

Example for a parallel multi-node MPI calculation:

#!/bin/bash
#PBS -T intmpi
#PBS -b 2
#PBS -l cpunum_job=32
#PBS -l elapstim_req=01:00:00
#PBS -l cputim_job=32:00:00
#PBS -l memsz_job=10gb
#PBS -N testjob
#PBS -o stdstderr.out
#PBS -j o
#PBS -q clexpress

# Change into qsub directory
cd $PBS_O_WORKDIR

# Initialie of the Intel environment
module load intel17.0.4 intelmpi17.0.4

# Start the parallel calculation
mpirun $NQSII_MPIOPTS -np 64 ./executablename

# Show  used resources after the job
/usr/bin/nqsII/qstat -f ${PBS_JOBID/0:/}

 

NQSII-commands for job submision and control

 

The most important commands for working with the NQSII batch system are:

  • qsub  <nqs_script>  submitting a batch computation
  • qstat delivers information about own jobs
  • qdel  <jobid> ends a running or removes a waiting job
  • qstatall lists all  running and waiting jobs of the entire NEC HPC System
  • qstatace lists all running and waiting jobs of the entire NEC Linux Cluster
  • qstat  -f  <jobid> delivers further information about the specified job
  • qalter  <jobid> alters job resource of a waiting job
  • qcat -o  <jobid> displays content of the standard output that a specified job has produced so far (please use the option -n <number> to adapt the number of lines shown)


Support and Consulting

HPC-Support-Team: hpcsupport@rz.uni-kiel.de
Responsible contact persons at the Computing Centre:
Please see HPC-Support and Consulting.