During the summer a significant amount of work will be going on on the clusters. There are two periods of downtime (July and August) that you should be aware of. More details are given below.
During this period all the clusters will be unavailable. We will be carrying out work including upgrading the OS and the software stack.
As the operating system version, the scheduler and the software stack will all change the queues will be emptied. Any jobs still in the queue on the 9th of July will be cancelled.
The standard (every three months) maintenance days remain in place to allow for any corrections required following the major upgrades.
In order to provide a modern environment and apply performance, security and bug fixes we will be changing the operating system and batch system.
As a consequence of the reinstallation the ssh host keys have changed for all the nodes (login and compute). That's why you see the "WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!" message. First you have to remove the offending key, then you must check the new key against the list below before accepting the new one, the type picked up (rsa, dsa) depends on your ssh client version.
Check the current saved key
ssh-keygen -F fidis.epfl.ch -l
Remove the old key
ssh-keygen -R fidis.epfl.ch
Here are the keys for the login nodes:
DSA SHA256:HvbcrjP3X29Bn0g1OgXB+vHLEjWuSWvoaf/vLdZoe/Q MD5:7a:a0:c8:f2:e9:2d:7f:e7:66:69:4a:d3:b8:dd:22:bd ECDSA SHA256:mLntSNT2Rz69Nba7jOyWt6r+kwF4zpmZZlyrRGfRgXw MD5:5a:c4:54:4d:c5:44:9c:f5:10:1d:e8:17:09:ab:27:4e ED25519 SHA256:Syy7jQvfKZKMUcWXBM+MIZLxPyVxsdGEiSCylr0kmQI MD5:aa:3c:6e:18:9d:95:3f:cc:73:77:a8:f9:1a:fb:b7:9a RSA SHA256:mqLUyQMqKNzC8b+HgBPvf3QRnojeZWFJNWHCt7dE1FM MD5:2f:d3:37:32:88:4f:e6:70:29:15:d8:48:10:b4:d5:1f
DSA SHA256:jjg2UsZi+5vn4JVSHhHO+8nCbJe+PYeus5oXDy5HrIg MD5:59:d4:c1:a0:97:bc:90:ae:69:46:af:03:41:3c:3b:5b ECDSA SHA256:/4HY5EVScDLu/Btzr0Rr7b+G9vcZtqp2a3P6NoH5Ey4 MD5:95:76:2b:0d:10:ad:2d:be:ca:cd:8e:37:3b:f5:56:3f ED25519 SHA256:fUxBEzD/f9VKiEzApNQQ6bs2p/9d1vUcYeplHrzNVfc MD5:f0:03:14:d8:6c:a0:66:61:b9:65:18:04:b2:5a:27:ef RSA SHA256:NbpSOygH72m2b4xb4bptwv0of8RXhHiYMJeuUsG4hcA MD5:b8:02:e0:27:8b:75:ce:9a:3f:5a:78:18:de:7e:3a:15
DSA SHA256:jjg2UsZi+5vn4JVSHhHO+8nCbJe+PYeus5oXDy5HrIg MD5:59:d4:c1:a0:97:bc:90:ae:69:46:af:03:41:3c:3b:5b ECDSA SHA256:kYv/aUHBYR1v7NUGDqieMEET9EcB3Uk4CDVSaTuoJ5c MD5:b6:53:ec:b0:a8:aa:a9:8c:76:e5:f0:ac:d5:8d:88:11 ED25519 SHA256:R/BxGi0pIlW9s7o2xNOcxcHuo4fneu+hWAghR1nimDo MD5:04:f6:b9:f1:77:b2:a6:c3:28:7c:ac:b0:30:93:59:4c RSA SHA256:NbpSOygH72m2b4xb4bptwv0of8RXhHiYMJeuUsG4hcA MD5:b8:02:e0:27:8b:75:ce:9a:3f:5a:78:18:de:7e:3a:15
We will carry on upgrades and performance improvements to the filesystems:
We will also complete the migration to the new global storage system by physically removing the old GSS storage.
The software stack is everything you see and load with the modules command. Every summer we release a new bundle of packages with the main changes being the compiler and MPI versions as well as newer versions of many packages.
During the maintenance period in July (9-16) the next release of our software stack will be promoted to stable.
As of today the new release is available under the name 'future' and can already be used by issuing the following command before loading any modules:
$ slmodules -r future $ module load <module name>
We encourage you to try the new release as soon as possible and let us know (by opening a ticket) if you find any issues or missing software packages.
With this release newer versions of almost all packages and libraries are available. Please be aware that for packages with multiple package versions only one of the versions will be kept once the 'future' release becomes the default one.
In particular the supported compilers are now
If you compile code now using "Future" it is not impossible that it will need to be recompiled after July 16th, although the software release is the same, the underlying operating system and scheduler versions will be different.
The current stable release will still be available under the name 'deprecated'.
To use what is now the stable release after the 16th of August you will need to change to the 'deprecated' release before loading any modules:
$ slmodules -r deprecated $ module load <module name>
Please also be aware that the current deprecated set of software modules will no longer be available.
If you are still using any software packages or libraries from the current deprecated release please check the future release today to see if it is available there, and let us know immediately if it is not.
There is now an additional "build" partition available on the clusters to allow you to compile codes for the different architectures. Please note that, unlike the "debug" partition, these nodes are shared. To make use of them please use the Sinteract tool using the "-s" flag to specify the architecture and select the appropriate number of CPUS and amount of memory.
These nodes also have a slightly different operating system configuration to allow debuggers such as GDB to work correctly.
In order to increase system stability the nodes in the serial partition on Deneb now have two CPUs reserved for system use. This means that a maximum of 14 cores are available per serial node.
The nodes with 256 and 512 GB of memory on Deneb are now part of the serial partition as they are intended for non MPI tasks.
If you require access to these nodes then you should explicitly ask for the serial partition with "-p serial".
For large memory MPI calculations you should use Fidis which has 72 nodes with 256GB of memory connected via fully non-blocking FDR Infiniband.