NEC HPC-Linux-Cluster Upgrade

no/neinStudierende yes/jaBeschäftigte yes/jaEinrichtungen no/neinstud. Gruppen

Dear NEC-Users,

after a longer downtime, we are now pleased to inform you that the NEC HPC-system is open again for user access. With the system upgrade essentially more compute performance becomes available on the NEC HPC-Linux-Cluster while the powerful NEC SX-ACE vector system continues to operate without any changes.

On the Linux-Cluster partition, the upgrade amounts to an increase of the theoretical peak performance by a factor of 9. More specifically, the old x86 compute nodes (Sandy-Bridge) have been replaced by 180 brand-new nodes with Intel Xeon Gold 6130 (Skylake) processors, each having 32 cores and either 192GB or 384GB main memory.

Furthermore, the ScaTeFS file system has been enlarged and now provides in total 5PB of work space. As a new feature this work space is, as of now, governed by user block-size and inodes quotas.

Below, we summarize important changes users should bear in mind when connecting and using the upgraded system!

We hope you will enjoy working on the upgraded system and are looking forward to a high workload on the batch system.

If you encounter problems, do not hesitate to contact us via hpcsupport@rz.uni-kiel.de.

Kind regards,

your HPC-Support Team


 

A. User access

  • As the ssh host key of the nesh-fe front ends has changed, the first ssh-access of users may lead to an error message like:

    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    @    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

  • The simplest option to regain access is to delete the .ssh/known_hosts file in the home directory on the pc, from which you try to login onto the nesh-fe front ends. Alternatively, only the line specified in the error message has to be removed from the known_hosts file.
  • You may also directly follow the instruction given in the error message using the ssh-keygen command, but use the command your system tells you.
  • To change your password please use the command yppasswd.

 

B. Login and compute nodes

  • The operating system has been upgraded to Red Hat Linux version 7.4.
  • For smoother operation the login nodes (nesh-fe.rz.uni-kiel.de) now have 768GB main memory.
  • The batch classes clexpress, clmedium, cllong access nodes nodes with 192GB main memory, the batch class clbigmem offers up to 384GB. For details, see NEC HPC-Linux-Cluster.
  • The clfocean batch class is out of service.
  • The clfo2 batch class is still in prodution, operating on the Haswell nodes with 24 cores, 128GB max. main memory.
  • Check the command qcl for available batch queues and current job limits.
  • There are no changes to NQSII/PBS batch commands and variables.

 

C. ScaTeFS file system ($WORK) and user quotas

  • Each user has work space either on the path /sfs/fs1/... or /sfs/fs2/..., which is acessible via the environment variable $WORK. Please, note (!) that the absolute path of your $WORK variable may have changed, thus check echo $WORK.
  • As of now, work space on the ScaTeFS file system is controlled by user quotas, which means that disk space and inodes (number of files plus directories) is a limited resource.
  • Use the command workquota to check your current usage and settings.
  • The default hard quota is ~2TB=2000GB.
  • Users which already exceed the default hard quota can check their current quota usage and settings with the above command.

 

D. Software

  • The overall software stack (except for the SX-ACE) has been rebuilt for the new operating system Red Hat Linux version 7.4.
  • Check the command module avail to display the currently available software and let us know if you need additional programs to be installed.