Slurm has two commands which you can use to launch jobs: srun and sbatch.
srun is straightforward: srun <arguments> <program>
You can use various arguments to control the resource allocation of your job, as well as other settings your job requires.
You can find the full list of available arguments here. Some of the most common arguments are:
-c # – allocate # CPUs per task.
‑‑gres=gpu:# – allocate # GPUs.
‑‑mpi=<mpi_type> – define the type of mpi to use. The default (when isn’t explicitly specified) is pmi2.
‑‑pty – run job in pseudo terminal mode. Useful when running bash.
-p <partition> – run job on the selected partition instead of the default one.
-A <account> – to run job in selected partition, add account name connected to this partition
-w <node> – run job on a specific node.
Check out the Partitions, accounts and QoS section for more on selecting partitions and nodes.
Examples:
1. To get a shell with two GPUs, run:
srun ‑‑gres=gpu:2 ‑‑pty /bin/bash
run ‘nvidia-smi’ to verify the job received two GPUs.
2. Run the script ‘script.py’ using python3:
srun python3 script.py
sbatch lets you use a Slurm script to run jobs.
Exemples:
#!/bin/bash
#SBATCH -c 8 # number of cores (treats)
#SBATCH --gres=gpu:A40:1 # Request 1 gpu type A40
#SBATCH --mail-user=[user]@cs.technion.ac.il
#SBATCH --mail-type=ALL # Valid values are NONE, BEGIN, END, FAIL, REQUEUE, ALL
#SBATCH --job-name="JobName"
#SBATCH -o ./out_job%j.txt # stdout goes to out_job.txt
#SBATCH -e ./err_job%j.txt # stderr goes to err_job.txt
module purge # clean active modules list
module load matlab/R2023a # activate matlab 2023 module
conda activate [your miniconda environment]
python main.py # run your script
You can find more information here.