Skip to content

SLURM Usage

SLURM is used to submit jobs on the different partitions from the Dalek frontend node. The available partitions are listed in the description page.

Info

SLURM version 24.11 is currently installed on the cluster.

Frontend Connection

It is recommended to add some lines to your ~/.ssh/config file as explained in the SSH access section. Then, to connect to the frontend from your computer you only have to do:

ssh front.dalek.lip6

Basic SLURM Commands

This page presents some useful commands to start using SLURM. Of course this is not exhaustive and we strongly advice users to browse the SLURM full documentation available online.

sinfo

sinfo lists the available partitions:

$ sinfo -l
Sat Sep 20 16:39:03 2025
PARTITION  AVAIL  TIMELIMIT    JOB_SIZE  ROOT  OVERSUBS  GROUPS  NODES  STATE  NODELIST
az4-n4090     up   infinite  1-infinite    no        NO     all      4  idle~  az4-n4090-[0-3]
az4-a7900     up   infinite  1-infinite    no        NO     all      4  idle~  az4-a7900-[0-3]
az4-mixed     up   infinite  1-infinite    no        NO     all      8  idle~  az4-n4090-[0-3],az4-a7900-[0-3]
iml-ia770     up   infinite  1-infinite    no        NO     all      4  idle~  iml-ia770-[0-3]
az5-a890m     up   infinite  1-infinite    no        NO     all      4  idle~  az5-a890m-[0-3]
dalek*        up   infinite  1-infinite    no        NO     all     16  idle~  az4-a7900-[0-3],az4-n4090-[0-3],az5-a890m-[0-3],iml-ia770-[0-3]

The special character after the state, indicates:

  • ~: The node is presently in powered off
  • #: The node is presently being powered up or configured
  • !: The node is pending power down
  • %: The node is presently being powered down
  • @: The node is pending reboot

If there is no special character it means that the node is powered on.

srun

srun -p [partition] command runs a command on a partition:

$ srun -p az4-n4090 -N 2 hostname
az4-n4090-1
az4-n4090-3

Submission of a job that executes the hostname command on two nodes of the az4-n4090 partition.

srun -w [node] --interactive runs a interactive session on a specific node:

srun -w az4-a7900-1 --interactive --pty /bin/bash
Interactive job on the az4-a7900-1 node. You can add --exclusive to avoid other users to connect to this node at the same time.

Warning

Depending on your SLURM QOS configuration you might not be able to reserve a node exclusively.

Info

An easier way to connect interactively to the nodes is to use a custom ~/.ssh/config file as detailed in the SSH Access page.

sbatch

sbatch [script file] runs a SLURM script on the cluster. This is the preferred way to use the Dalek cluster.

Here is a basic example of SLURM batch file:

example.sb
#!/bin/bash

# SLURM options:
# --------------

#SBATCH --job-name=my_job_name      # Name of the job
#SBATCH --output=my_job_name_%j.log # Logs of the job (stdout and stderr)
#SBATCH --partition=az4-n4090       # Select the partition (dalek by default)
#SBATCH --nodes=2                   # Number of nodes to reserve
#SBATCH --ntasks=4                  # Number max of processes to run
#SBATCH --cpus-per-task=3           # Number of CPUs (or cores here) per task
#SBATCH --time=01:00:00             # Maximum duration of the job (hh:mm:ss)

# Commands to submit:
# (executed on one node)
# ----------------------

echo "Start of the job"
date

pwd # where are we
hostname # node(s) hostname

# basic environment variable that can be used in a batch script
echo "The id of this job is: $SLURM_JOB_ID"
echo "The name of this job is: $SLURM_JOB_NAME"
echo "This job is executed on the following node(s): $SLURM_JOB_NODELIST"
echo "The number of tasks of the job is: $SLURM_NTASKS"
echo "The number of CPUs per task is: $SLURM_CPUS_PER_TASK"

# sleeps 60 seconds
sleep 60

echo "End of the job"
date

And to submit the batch script:

sbatch example.sb

When the job will be executed, it will produce a my_job_name_[JID].log file where it has been executed.

squeue

squeue -l allows you to view current submitted jobs on the cluster:

$ squeue -l
Sat Sep 20 17:32:08 2025
JOBID  PARTITION      NAME      USER    STATE  TIME  TIME_LIMI  NODES  NODELIST(REASON)
 1819  az4-n4090  my_job_n  cassagne  RUNNING  0:07    1:00:00      2  az4-n4090-[0-1]
For instance, here one job from the cassagne user is running on the az4-n4090 partition and it is taking two nodes (az4-n4090-0 and az4-n4090-1). It is important to note that each job has a unique identifier (see the JOBID column).

scancel

  • scancel [JOBID] cancels a job with its job id.

  • scancel -u [USER] cancels all the jobs for a given user.