Page tree
Skip to end of metadata
Go to start of metadata

Maintenance Windows

20th to 22nd of July

This year our annual maintenance will be strictly for work in the central storage and associated network.

This means it will be shorter than usual, only 3 days.

Since all the shared filesystems (/home, /work, /ssoft) will be unavailable all the clusters will be unavailable.
The /scratch filesystems will be inaccessible since the login nodes will be closed but will not be affected.

Jobs will not be running during this period but any jobs in the queues will resume once the maintenance is over as no user-visible changes will take place (except for the Node Allocation Policy change, see below).

Changes

Node Allocation policy change (parallel partitions - ALL clusters)

This is an important change with potential impact on cost. Please read it carefully and contact us in case of any doubt.

Until now, if you ran on less than a full node on the parallel partitions, the rest of the node would be available only to you. Unfortunately we found this heavily distorts the fairshare mechanism.

The possibility to do this will be removed after the Summer Maintenance. This means that on the parallel partitions you will always be allocated/charged the whole node.

If your jobs use less than a full node please move to the serial partition which is available since the end of June on the Fidis cluster. In this partition you can easily be allocated less than a full node (a single CPU core for example).

There are currently 60 nodes on the serial partition but the size will be adjusted based on the demand.

This table present the impact for some typical jobs:

Job TypePartitionAllocation BeforeAllocation NowAction required?
1 node, 1 coreparallel1 core1 node (warning)Yes, move to serial partition.
1 node, 14 coresparallel1 core1 node (warning)Yes, move to serial partition.
1 node, 28 coresparallel28 cores1 node (tick)No.
2 nodes, 4 cores-per-nodeparallel2 * 4 cores2 nodes (warning)Maybe, evaluate if you really need the full resources of the node (memory, etc).
2 nodes, 24 cores-per-nodeparallel2 * 24 cores2 nodes (tick)No.
2 nodes, 18 cores-per-nodeparallel2 * 28 cores2 nodes (tick)No.

Job Submit plugin (Fidis)

On Fidis single node jobs which use less than 75% of the cores of the node are now automatically redirected to the serial partition at submission time.

For example, on the table above, the two first jobs will automatically be moved to the serial partition at submission time (unless you explicitly request that the job has exclusive access to the node).

Questions?

Contact us via email: 1234@epfl.ch


  • No labels