The SCITAS machines use the SLURM workload manager in order to schedule users’ jobs. In particular, SLURM arbitrates the jobs’ queue contention by using a fair-share algorithm in SLURM is described at http://slurm.schedmd.com/fair_tree.html.
Info |
---|
SCITAS machines have a half-life of one week. |
To see the share for your group you can use the "Sshare" command"
...
order to prioritize jobs and ensure that the users’ usage matches their share as much as possible. In particular, SCITAS clusters use a particular flavor of the fair-share algorithm called fair-tree.
In order to check their priority, the Sshare command is available on any SCITAS cluster. A typical output will be as follow:
$ Sshare
...
...
Account User Raw Shares Norm Shares Raw
...
Usage
...
Norm Usage Effectv Usage FairShare
...
Level
...
FS
...
-------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ----------
...
scitas-ge
...
1 0.007752
...
1376 0.000003
...
0.000005
...
1468.763590
...
scitas-ge aubort
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge clemenco
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge cubuk
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge culpo
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge degiorgi
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge eroche
...
1 0.043478
...
344 0.000001
...
0.250000
...
0.253333
...
0.173913
...
scitas-ge nvarini
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge qubit
...
1 0.043478
...
351 0.000001
...
0.255072
...
0.250000
...
0.170455
...
scitas-ge rezzonic
...
1 0.043478
...
681 0.000001
...
0.494928
...
0.246667
...
0.087848
...
scitas-ge richart
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge rmsilva
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge sue
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
inf
scitas-ge topf
...
1 0.043478
...
0 0.000000
...
0.000000
...
0.290000
...
...
inf
The value used to decide the priority of a job is the "Level FS". The higher the Level FS, the higher the priority. Level FS is the ratio of "Norm Shares" and "Effectv Usage" values, therefore a Level FS of less than 1 represents an overconsumption and more than 1 represents an underconsuming.
In this formula, the "Norm Shares" is the percentage of the cluster which is allocated to the account and the shares are in terms of coreswhereas “Effectv Usage” augments the normalized usage (the users' raw usage normalized to the total number of cpu-seconds of all jobs run) to account for usage from sibling accounts for usage from sibling accounts. Within a group all users have equal weight and so 1 share each.
The value used to decide the priority of a job is the "Level FS" and this is calculated based on the difference between the "Norm Shares" and "Effectv Usage" values. The higher the Level FS, the higher the priority.
A Level FS of less than 1 represents overconsumption. More than 1 means you are underconsuming.
More informations about SLURM, fair-share and fair-tree can be found here:
https://slurm.schedmd.com/overview.html
https://slurm.schedmd.com/priority_multifactor.html
https://slurm.schedmd.com/fair_tree.html
Related articles
Content by Label | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...