Maintenance Windows

8th to 23rd of July

During this period all the clusters were unavailable.

We carried out work including upgrading the OS and the software stack on Fidis and Helvetios.

Beware that no work was performed on Deneb other than regular maintenance operations.

August 13th (Fidis) and August 20th (Helvetios)

The standard (quarterly) maintenance windows remain in place to allow for any corrections required following the major upgrades.

Details

Operating System and Slurm scheduler

In order to provide a modern environment and apply performance, security and bug fixes we have changed the operating system and batch system.

Slurm scheduler

No changes were done to the Slurm configuration but the change to this version will allow us to introduce some small changes in the upcoming months.

However, the change of version implies some changes in user visible interfaces, in particular:

Jobs in the queue during the maintenance

As the operating system version, the scheduler and the software stack all changed it is unlikely that jobs submitted before the maintenance will succeed. Nevertheless these jobs will remain in the queue and will be put on user hold (meaning they will not start unless explicitly released by the users).

# to list your jobs in the queue
Squeue

# to release a job so that it starts running
scontrol release job_list
# job_list is a comma separated list of job IDs

Any jobs still in user hold by the August maintenance of each respective cluster will be definitely cancelled.

Storage

We have deployed upgrades and performance improvements to the filesystems:

Infiniband Fabric

On Fidis the fabric was reconfigured to improve the inter-node performance and in particular to allow for more consistent performance across different groups of nodes. The performance of the access to the local /scratch  filesystem stays the same.

New Software Stack

The software stack is everything you see and load with the modules command. Every summer we release a new bundle of packages with the main changes being the compiler and MPI versions as well as newer versions of many packages. 

New release (2019.07 - humagne)

During the maintenance period in July the new release of our software stack was deployed, and is now known as the 'stable' release.

With this release newer versions of almost all packages and libraries were made available.

In particular the supported compilers are now

Deprecated release (2018.07 - paien)

The current stable release will still be available under the name 'deprecated'.

To use this release you will need to change to the 'deprecated' release before loading any modules:

$ slmodules -r deprecated
$ module load <module name>

Previous deprecated (2017.07 - cornalin)

Please also be aware that the current deprecated set of software modules will no longer be available.


If you are still using any software packages or libraries from the deprecated release which are not available in the stable release please let us know.

  • let us know if you use any packages/libraries not yet available
  • contact us for any questions or issues by sending an email to 1234@epfl.ch starting the subject with 'HPC'