SLURM Usage¶
SLURM is used to submit jobs on the different partitions from the Dalek frontend node. The available partitions are listed in the description page.
Info
SLURM version 24.11 is currently installed on the cluster.
Frontend Connection¶
It is recommended to add some lines to your ~/.ssh/config file as explained
in the SSH access section. Then, to connect to the frontend
from your computer you only have to do:
Basic SLURM Commands¶
This page presents some useful commands to start using SLURM. Of course this is not exhaustive and we strongly advice users to browse the SLURM full documentation available online.
sinfo¶
sinfo lists the available partitions:
$ sinfo -l
Sat Sep 20 16:39:03 2025
PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT OVERSUBS GROUPS NODES STATE NODELIST
az4-n4090 up infinite 1-infinite no NO all 4 idle~ az4-n4090-[0-3]
az4-a7900 up infinite 1-infinite no NO all 4 idle~ az4-a7900-[0-3]
az4-mixed up infinite 1-infinite no NO all 8 idle~ az4-n4090-[0-3],az4-a7900-[0-3]
iml-ia770 up infinite 1-infinite no NO all 4 idle~ iml-ia770-[0-3]
az5-a890m up infinite 1-infinite no NO all 4 idle~ az5-a890m-[0-3]
dalek* up infinite 1-infinite no NO all 16 idle~ az4-a7900-[0-3],az4-n4090-[0-3],az5-a890m-[0-3],iml-ia770-[0-3]
The special character after the state, indicates:
~: The node is presently in powered off#: The node is presently being powered up or configured!: The node is pending power down%: The node is presently being powered down@: The node is pending reboot
If there is no special character it means that the node is powered on.
srun¶
srun -p [partition] command runs a command on a partition:
Submission of a job that executes the hostname command on two nodes of the
az4-n4090 partition.
srun -w [node] --interactive runs a interactive session on a specific node:
az4-a7900-1 node. You can add --exclusive to avoid
other users to connect to this node at the same time.
Warning
Depending on your SLURM QOS configuration you might not be able to reserve a node exclusively.
Info
An easier way to connect interactively to the nodes is to use a custom
~/.ssh/config file as detailed in the SSH Access
page.
sbatch¶
sbatch [script file] runs a SLURM script on the cluster. This is the
preferred way to use the Dalek cluster.
Here is a basic example of SLURM batch file:
#!/bin/bash
# SLURM options:
# --------------
#SBATCH --job-name=my_job_name # Name of the job
#SBATCH --output=my_job_name_%j.log # Logs of the job (stdout and stderr)
#SBATCH --partition=az4-n4090 # Select the partition (dalek by default)
#SBATCH --nodes=2 # Number of nodes to reserve
#SBATCH --ntasks=4 # Number max of processes to run
#SBATCH --cpus-per-task=3 # Number of CPUs (or cores here) per task
#SBATCH --time=01:00:00 # Maximum duration of the job (hh:mm:ss)
# Commands to submit:
# (executed on one node)
# ----------------------
echo "Start of the job"
date
pwd # where are we
hostname # node(s) hostname
# basic environment variable that can be used in a batch script
echo "The id of this job is: $SLURM_JOB_ID"
echo "The name of this job is: $SLURM_JOB_NAME"
echo "This job is executed on the following node(s): $SLURM_JOB_NODELIST"
echo "The number of tasks of the job is: $SLURM_NTASKS"
echo "The number of CPUs per task is: $SLURM_CPUS_PER_TASK"
# sleeps 60 seconds
sleep 60
echo "End of the job"
date
And to submit the batch script:
When the job will be executed, it will produce a my_job_name_[JID].log file
where it has been executed.
squeue¶
squeue -l allows you to view current submitted jobs on the cluster:
$ squeue -l
Sat Sep 20 17:32:08 2025
JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
1819 az4-n4090 my_job_n cassagne RUNNING 0:07 1:00:00 2 az4-n4090-[0-1]
cassagne user is running on the
az4-n4090 partition and it is taking two nodes (az4-n4090-0 and
az4-n4090-1). It is important to note that each job has a unique
identifier (see the JOBID column).
scancel¶
-
scancel [JOBID]cancels a job with its job id. -
scancel -u [USER]cancels all the jobs for a given user.