Skip to content

SLURM Usage

SLURM is used to submit jobs on the different partitions from the Dalek frontend node. The available partitions are listed in the description page.

Frontend Connection

It is recommended to add some lines to your ~/.ssh/config file as explained in the SSH access section. Then, to connect to the frontend from your computer you only have to do:

ssh front.dalek.lip6

Basic SLURM Commands

Here are some useful command to start using SLURM:

  • sinfo -l lists the available partitions

    $ sinfo -l
    Wed Jan 01 00:00:00 2025
    PARTITION AVAIL  TIMELIMIT   JOB_SIZE ROOT OVERSUBS     GROUPS  NODES       STATE NODELIST
    az4-n4090    up   infinite 1-infinite   no       NO        all      4        idle az4-n4090-[0-3]
    az4-a7900    up   infinite 1-infinite   no       NO        all      4        idle az4-a7900-[0-3]
    az4-mixed    up   infinite 1-infinite   no       NO        all      8        idle az4-n4090-[0-3],az4-a7900-[0-3]
    iml-ia770    up   infinite 1-infinite   no       NO        all      4        idle iml-ia770-[0-3]
    az5-a890m    up   infinite 1-infinite   no       NO        all      4        idle az5-a890m-[0-3]
    

  • srun -p [partition] command runs a command on a partition

    $ srun -p az4-n4090 -N 4 hostname
    az4-n4090-1
    az4-n4090-3
    az4-n4090-0
    az4-n4090-2
    
    Submission of a job that executes the hostname command on the nodes of the az4-n4090 partition.

  • srun -p [partition] --pty bash -i runs a interactive session on a partition

    srun -p az4-a7900 --exclusive --pty bash -i
    
    Interactive job on a node of the az4-a7900 partition. --exclusive means that other users cannot connect to this node at the same time.

    Info

    An easier way to connect interactively to the nodes is to use a custom ~/.ssh/config file as detailed in the SSH Access page.

  • sbatch [script] runs a SLURM script on the cluster

  • squeue -l allows you to view current submitted jobs on the cluster

    $ squeue -l
    Wed Jan 01 00:00:00 2025
                 JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
                713705 iml-ia770     bash  galveze  RUNNING      12:12 UNLIMITED      2 iml-ia770-[1,3]
    
    For instance, here one job from the galveze user is running on the iml-ia770 partition and it is taking two nodes (iml-ia770-1 and iml-ia770-3).

  • scancel [jobid] cancels a job

  • scancel -u [user] cancels all the jobs for a given user