SLURM - AI & HPC Site

What is Slurm?

As the name suggests, Slurm Workload Manager is a software that allocates resources for computational jobs based on predefined settings.
Contrary to regular (or local) program execution, programs (jobs) are launched on the Login (or Controller) server and are distributed to one or more of the physical servers (Nodes). SLURM is the software that helps defining and executing these jobs, as well as managing users, permissions and resource allocation. It helps tracking and displaying job details as well.

Useful Commands .

After logging in, you’ll be able to use these command:

sinfo -N – shows the state of each node:

idle: No job currently runs on the node.
mix: Jobs currently run on this node.
down: Node error.
drain (drng): The node waits for its jobs to finish and won’t accept new jobs (usually before boot).
boot: The node is rebooting (usually after an update).

snode– show status CPU /GPU per node.

squeue – show the currently running and pending jobs. Use ‘squeue -u $USER’ to view only your jobs.

Job states (ST):

- - R: The job is running.
  - PD: The job is pending (waiting in queue to run).
  - CG: The job is completing and will be finished soon.

If the job is pending (PD), the NODELIST(REASON) column will indicate why:

- - Priority: A higher priority job is pending execution.
  - Resources: The job has the highest priority and is waiting for resources.
  - QOS*Limit: You have the maximum number of jobs running. The job will launch when another one finishes. Refer to the Job and resource limits section.
  - launch failed requeued held: Job launch failed due to an unknown reason. Use scontrol to requeue the job. Contact us if the problem persists.
- To show all ditails of listed jobs very usefull follow form:

squeue --Format=" JobID:9,Account:8,Partition:10,UserName:20,NodeList:15,TimeUsed:13,tres-alloc:55,State:10,Reason:40"

scontrol – used to control jobs. You can only control your jobs.

- scontrol show job <id>: show details for job with id <id> (JOBID column in squeue).
- scontrol requeue <id>: requeue job with id <id>

scancel– cancels a job. You can only cancel your jobs.

- scancel <job id>
- scancel –me # cancel all jobs for corrent user

Your home folder is /home/<username>, is shared across the network and is the same folder on all the nodes. If you’re coming from Rishon cluster – your home folder is the same.
Important: Although the storage system is fault-tolerant, backup your files regularly to an external location. Make sure you have a backup when you finish a course or a project – home folders of inactive users will be deleted without prior notice.

Note: Although using SSH via a command shell (on Windows, Linux or Mac) is possible, you may want to take advantage of free graphical clients such as MobaXterm which also provides an SFTP file explorer to enable easy to use drag-and-drop file transfer to and from the server.

Priority calculation

This is a rough description of the priority calculation:
Each resource has a billing “cost”. When a job completes, its total cost is calculated using the allocated resources’ cost and the job’s run time. The user is then “billed” for that cost.
When the user runs another job, it enters the queue. Its priority is determined relative to the queued jobs – the lower the user’s bill, the higher the priority his job gets.
There are some other parameters used for calculating priority. One of them is queued time – the longer a job waits in the queue, the higher its priority.
Another layer of Fair Share is the Account system: Each course has an account (usually the course’s number). Projects are all in the same account. Billing is also calculated across accounts. Thus, if course A has a lower total billing cost (combined billing costs of all jobs executed from that account) than course B, jobs queued from course A will get priority over course B’s jobs.

Some notes:

Cost is calculated for allocated resources, not used resources: If you ask for two GPUs, for example, and only use one – you will be billed for two GPUs. That’s one reason you shouldn’t overschedule resources.
Currently, only CPUs and GPUs have a ‘cost’.
There may be situations where user X from course A has a higher billing than user Y from course B, and yet the former’s jobs receive a higher priority than the latter. This is the desired behavior to avoid the cluster from being choked by a specific course.
Billing costs decay over time.
Billing cost is for priority calculation only. You won’t be charged real money.
As mentioned at the beginning of this section – you will only notice the queueing system when the cluster is full and there aren’t enough available resources to allocate to your job.