A partition is a group of nodes. Slurm allows defining different partitions for micromanaging different types of jobs. We don’t need this functionality and don’t use it. Only one partition is defined (“all”) and is the default for everyone.
Quality of Service (QoS) is another control group entity that helps the Fair Share queueing system. We use account-specific QoS. Don’t try to change your job’s QoS – it won’t work (by design).
Accounts are groups of users. Each course has its own account. All projects belong to a single account. A user can belong to more than one account (if they enroll in more than one course or project).
Every user also has a Default Account which is chosen to run a job when no account is specified.
If you’re enrolled in more than one course (or a course and a project) and have received permission to use Lambda on both, your user is available on more than one account. Generally, the last course you’ve enrolled in will be your default account. Because every account has its own resource limits and billing, you can use a different account for each job you launch, in order to make full use of your resource permissions:
Run ‘sacctmgr show user $USER withassoc’ to view which accounts your user belongs to.
Use the ‘‑‑account=<account>’ or ‘-A <account>‘ arguments to choose which account the job runs on (e.g. srun -A projects ‑‑pty /bin/bash).
Effective job limits
These will be modified over time according to resource availability and needs:
- Maximum number of running jobs per user: 1
- Maximum combined running and queued jobs per user: 5
- Maximum running time per job: 1 day
- Maximum allowed GPUs to be allocated per job: 1
- Total maximum allowed GPUs to be allocated per user concurrently: 1
- Total maximum allowed CPUs to be allocated per user concurrently: 2
- Maximum number of running jobs per user: 3
- Maximum combined running and queued jobs per user: 10
- Maximum running time per job: 7 days
- Total maximum allowed GPUs to be allocated per user concurrently: 3