*** Under construction ***
HPC clusters and supercomputers resources are accounted in either core-hours or node-hours (and more generally CPU-hours). One core-hour is equal to one core used for one wall-clock hour computed from the time the core is allocated to the time it is deallocated.
The total resource available is computed by multiplying the total amount of nodes available for computations by the total allocation time in hours. For instance, a 400-node cluster will provide 400*24*365 = 3'504'000 node-hours over one year.
Users are typically given a percentage of the total resource, which has a node-hours equivalent. For instance, 10% of the HPC cluster means 40-node hours over 1 year = 350'400 node-hours.
use workload manager to dispatch jobs. Resources
The fair-share algorithm in SLURM is described at http://slurm.schedmd.com/fair_tree.html.
Info |
---|
SCITAS machines have a half-life of one week. |
To see the share for your group you can use the "Sshare" command"
...
SCITAS machines use the SLURM workload manager in order to schedule users’ jobs. In particular, SLURM arbitrates the jobs’ queue contention by using a fair-share algorithm in order to prioritize jobs and ensure that the users’ usage matches their share as much as possible. In particular, SCITAS clusters use a particular flavor of the fair-share algorithm called fair-tree.
In order to check their priority, the Sshare command is available on any SCITAS cluster. A typical output will be as follow:
$ Sshare
Account User Raw Shares Norm Shares
...
Raw Usage
...
Norm Usage Effectv Usage FairShare
...
Level FS
...
-------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ----------
...
scitas-ge
...
1 0.007752
...
1376 0.000003
...
0.000005
...
1468.763590
...
scitas-ge aubort
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge clemenco
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge cubuk
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge culpo
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge degiorgi
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge eroche
...
1 0.043478
...
344 0.000001
...
0.250000
...
0.253333
...
0.173913
...
scitas-ge nvarini
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge qubit
...
1 0.043478
...
351 0.000001
...
0.255072
...
0.250000
...
0.170455
...
scitas-ge rezzonic
...
1 0.043478
...
681 0.000001
...
0.494928
...
0.246667
...
0.087848
...
scitas-ge richart
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge rmsilva
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge sue
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge topf
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
...
inf
The value used to decide the priority of a job is the "Level FS". The higher the Level FS, the higher the priority. Level FS is the ratio of "Norm Shares" and "Effectv Usage" values, therefore a Level FS of less than 1 represents an overconsumption and more than 1 represents an underconsuming.
In this formula, the "Norm Shares" is the percentage of the cluster which is allocated to the account and the shares are in terms of coreswhereas “Effectv Usage” augments the normalized usage (the users' raw usage normalized to the total number of cpu-seconds of all jobs run) to account for usage from sibling accounts for usage from sibling accounts. Within a group all users have equal weight and so 1 share each.
The value used to decide the priority of a job is the "Level FS" and this is calculated based on the difference between the "Norm Shares" and "Effectv Usage" values. The higher the Level FS, the higher the priority.
A Level FS of less than 1 represents overconsumption. More than 1 means you are underconsuming.
More informations about SLURM, fair-share and fair-tree can be found here:
https://slurm.schedmd.com/overview.html
https://slurm.schedmd.com/priority_multifactor.html
https://slurm.schedmd.com/fair_tree.html
Related articles
Content by Label | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...