Dispatch Priority under Slurm
The Program Committee allocates resources during each program
Each project is allocated a certain number of hours on various
resources such as CPU core hours, GPU hours or time on an
We use a Slurm feature called QoS to manage which projects can
run on which partitions and which jobs get dispatched first.
QoS stands for Quality of Service.
This is all part of managing Fair Share usage.
The only prioritization that is managed by Slurm is the
dispatch or scheduling priority.
All users submit their jobs to be run by Slurm on a
particular resource, such as a Partition.
On a billable or allocated partition, the projects that have
allocated time available should run before those that do not
have an allocation.
This is true regardless of whether it is a Type A, B or C allocation.
An unallocated project is said to be running opportunistically.
On a non-billable or unallocated cluster, all projects are
Partitions at Fermilab
We currently have five partitions within the Fermilab LQCD
Our Pi0 cluster has the 'pi' CPU and 'pigpu' GPU computing partitions.
These two are both billable against an allocation.
The other three partitions ('bc', 'ds', and 'exdsg') are all
Therefore all joba running on these partitions are dispatched at an
equal first-in first-out priority.
There are limits in place to make sure that at least two (in some cases 3)
projects are active on a partition at any given time.
-- Submit host lq.fnal.gov|
|lq1csl||LQ1 CPU resources||Yes||112
|Pi0, Bc, Ds clusters
-- Submit host lattice.fnal.gov|
|pi||Pi0 CPU resources||Yes||336
|pigpu||Pi0 GPU resources||Yes||32
|bc||Bc CPU resources||No||224
|ds||Ds CPU resources||No||196
|exdsg||Dsg CPU resources||No||76
Slurm QoS defined at Fermilab
Jobs submitted to Slurm are associated with an appropriate
QoS (or Quality of Service) configuration.
Admins assign parameters to a QoS that are used to manage
dispatch priority and resource use limits.
Additional limits can be defined at the Account or
|test||quick tests of scripts||500||cpu=32
|| 00:30:00||1||3 |
|normal||Normal QOS (default)||300||
|fifo||simple first-in first-out||10||
|| 08:00:00 ||125 |
The default QoS for all allocated projects using a billable
partition is called 'normal'.
The default QoS for all jobs on a non-billable
partition is called 'fifo'.
It is a simple first-in first-out dispatch priority.
Both of these run at a wall-time
limit of 24 hours.
We have defined a 'test' QoS for users to run small test
jobs to see that their scripts work and their programs
run as expected.
These test jobs run at a relatively high priority
so that they will start as soon as nodes are available.
Any user can have no more than three jobs submitted and no more
than one job running at any given time.
Test jobs are limited to 30 mins of wall-time and just two nodes
The billable partitions also have a QoS defined as 'opp' for
opportunistic or unallocated running.
This QoS has a simple priority of 0 (zero) with a wall-time
limit of just 8 hours.
Opportunistic jobs will only run when there are nodes sitting idle.
When a project uses up all of the hours that they were allocated
for the program year, their jobs will be limited to the 'opp' QoS.
SLURM Commands to see current priorities
This section is still under development.
Your patience is appreciated.
This page last updated 24 Sept 2019.
If you have questions or feedback regarding this policy, please
send email to