The Software Program Committee allocates resources during each program year. Each project is allocated a certain number of hours on various resources such as CPU core hours, GPU hours or time on an Institutional Cluster. We use a Slurm feature called QoS (Quality of Service) to manage access to partitions by projects and job dispatch priority. This is all part of maintaining a Fair Share usage of allocated resources.
The only prioritization that is managed by Slurm is the dispatch or scheduling priority. All users submit their jobs to be run by Slurm on a particular resource, such as a Partition. On a billable or allocated partition, the projects that have allocated time available should run before those that do not have an allocation. This is true regardless of whether it is a Type A, B or C allocation. An unallocated project is said to be running opportunistically.
Partitions at Fermilab
We currently have a single partition within the Fermilab Lattice QCD Computing Facility. LQ1 cluster has the 'lq1csl' CPU computing partition. This is billable against an allocation. There are limits in place to make sure that at least two (in some cases 3) projects can be active at any given time.
The default QoS for all allocated projects is called normal. The default QoS for all projects without a current allocation is called opp (Opportunistic). Jobs running in this QoS are all dispatched at the same priority but will not start if there are normal jobs waiting in queue. Both of these run with a default wall-time limit of 8 hours. The normal QoS has a MaxWall limit of 24 hours.
We have defined a test QoS for users to run small test jobs to see that their scripts work and their programs run as expected. These test jobs run at a relatively high priority so that they will start as soon as nodes are available. Any user can have no more than three jobs submitted and no more than one job running at any given time. Test jobs are limited to 30 mins of wall-time and just two nodes (limit gpu=80).
We also have a QoS defined as opp for opportunistic or unallocated running. This QoS has a simple priority of 10 with a wall-time limit of just 8 hours. Opportunistic jobs will only run when there are nodes sitting idle. When a project uses up all of the hours that they were allocated for the program year, their jobs will be limited to the opp QoS.
To see the list of jobs currently in queue by partition, visit our cluster status web page. Click on the "Start Time" column header to sort the table by start time. For running jobs, this is the actual time that the jobs started. Following that are the Pending jobs in the predicted order they will start.
From a command line, Slurm's 'squeue' command lists the jobs that are queued. It includes running jobs as well as those waiting to be started, aka dispatched. By changing the format of the commands output, one can get a lot of information about several things, such as:
- Start time - actual or predicted
- QoS the job is running under
- Reason that the job is pending
- Calculated dispatch real-time priority of the job
The following is just a sample output. Use your project name after the "-A" option to get a listing of jobs for your account.
--(130)> squeue -o "%.8a %.8u %.6i %.12j %.12P %.8q %.6Q %.2t %.s %.10S %.10l %R" --sort='-p' -A mslight
ACCOUNT USER JOBID NAME PARTITION QOS PRIORI ST START_TIME TIME_LIMIT NODELIST(REASON)
mslight bazavov 105194 t144b67a lq1csl normal 170116 R 2020-11-23 9:00:00 lq1wn[018,021,023,043,045,073-074,107,165-166,168-170]
mslight bazavov 105195 t144b67a lq1csl normal 168838 PD N/A 9:00:00 (Dependency)
mslight bazavov 105196 t144b67a lq1csl normal 168838 PD N/A 9:00:00 (Dependency)
mslight bazavov 105197 t144b67a lq1csl normal 168838 PD N/A 9:00:00 (Dependency)
mslight bazavov 105198 t144b67a lq1csl normal 168838 PD N/A 9:00:00 (Dependency)