Run Jobs

This page is to give some basic instructions on how to run and monitor jobs on Slurm. Slurm provides excellent man pages for all of its commands, so if you have questions refer to the man pages.

Set Up

Load the environment module for Slurm to configure your PATH and Slurm related environment variables.

module load {{ClusterName}}

The modulefile sets environment variables that control the defaults for Slurm commands. These are documented in the man pages for each command. If you don't like the defaults then you can set them in your environment (for example, your .bashrc) and the modulefile won't change any variables that are already set. The environment variables can always be overridden by the command line options.

For example, the SQUEUE_FORMAT2 and SQUEUE_SORT environment variables are set so that the default output format is easier to read and contains useful information that isn't in the default format.

Key Slurm Commands

The key Slurm commands are

Command	Description	Example
salloc	Create a compute allocation.	salloc -c 1 --mem 1G -C 'spot&GHz:3.1'
srun	Run a job within an allocation.	srun --pty bin/bash
sbatch	Submit a batch script	sbatch -c 1 --mem 1G -C 'spot&GHz:3.1' script
squeue	Get job status
scancel	Cancel a job	scancel jobid
sinfo	Get info about Slurm node status	sinfo -p all
scontrol	view or modify Slurm configuration and state	scontrol show node nodename
sstat	Display various status information about a running job/step
sshare	Tool for listing fair share information
sprio	View the factors that comprise a job's scheduling priority
sacct	Display accounting data for jobs
sreport	Generate reports from the Slurm accounting data.
sview	Graphical tool for viewing cluster state

sbatch

The most common options for sbatch are listed here. For more details run man sbatch.

Options	Description	Default
-p, --partition=partition-names	Select the partition/partitions to run job on.	Set by slurm.InstanceConfig.DefaultPartition in config file.
-t, --time=time	Set a limit on total run time of the job.	SBATCH_TIMELIMIT="1:0:0" (1 hour)
-c, --cpus-per-task=ncpus	Number of cores.	Default is 1.
--mem=size[units]	Amount of memory. Default unit is M. Valid units are [K\|M\|G\|T].	SBATCH_MEM_PER_NODE=100M
-L, --licenses=license	Licenses used by the job.
-a, --array=indexes	Submit job array
-C, --constraint=list	Features required by the job. Multiple constraints can be specified with AND(&) and OR(	).
-d, --dependency=dependency-list	Don't start the job until the dependencies have been completed.
-D, --chdir=directory	Set the working directory of the job
--wait	Do not exit until the job finishes, Exit code of sbatch will be the same as the exit code of the job.
--wrap	Wrap shell commands in a batch script.

Run a simulation build followed by a regression

build_jobid=$(sbatch -c 4 --mem 4G -L vcs_build -C 'GHz:4|GHz:4.5' -t 30:0 sim-build.sh)
if sbatch -d "afterok:$build_jobid" -c 1 --mem 100M --wait submit-regression.sh; then
    echo "Regression Passed"
else
    echo "Regression Failed"
fi

srun

The srun is usually used to open a pseudo terminal on a compute node for you to run interactive jobs. It accepts most of the same options as sbatch to request cpus, memory, and node features.

To open up a pseudo terminal in your shell on a compute node with 4 cores and 16G of memory, execute the following command.

srun -c 4 --mem 8G --pty /bin/bash

This will queue a job and when it is allocated to a node and the node runs, the job control will be returned to your shell, but stdin and stdout will be on the compute node. If you set your DISPLAY environment variable and allow external X11 connections you can use this to run interactive GUI jobs on the compute node and have the windows on your instance.

xhost +
export DISPLAY=$(hostname):$(echo $DISPLAY | cut -d ':' -f 2)
srun -c 4 --mem 8G --pty /bin/bash
emacs . # Or whatever gui application you want to run. Should open a window.

Another way to run interactive GUI jobs is to use srun's --x11 flag to enable X11 forwarding.

srun -c 1 --mem 8G --pty --x11 emacs

squeue

The squeue command shows the status of jobs.

The output format can be customized using the --format or --Format options and you can configure the default output format using the corresponding SQUEUE_FORMAT or SQUEUE_FORMAT2 environment variables.

squeue

sprio

Use sprio to get information about a job's priority. This can be useful to figure out why a job is scheduled before or after another job.

sprio -j10,11

sacct

Display accounting information about jobs. For example, it can be used to get the requested CPU and memory and see the CPU time and memory actually used.

sacct -o JobID,User,JobName,AllocCPUS,State,ExitCode,Elapsed,CPUTime,MaxRSS,MaxVMSize,ReqCPUS,ReqMem,SystemCPU,TotalCPU,UserCPU -j 44

This shows more details.

sacct --allclusters --allusers --federation --starttime 1970-01-01 --format 'Submit,Start,End,jobid%15,State%15,user,account,cluster%15,AllocCPUS,AllocNodes,ExitCode,ReqMem,MaxRSS,MaxVMSize,MaxPages,Elapsed,CPUTime,UserCPU,SystemCPU,TotalCPU' | less

For more information:

man sacct

sreport

The sreport command can be used to generate report from the Slurm database.

Other Slurm Commands

Use man command to get information about these less commonly used Slurm commands.

Command	Description
sacctmgr	View/modify Slurm account information
sattach	Attach to a job step
sbcast	Transmit a file to the nodes allocated to a Slurm job.
scrontab	Manage slurm crontab files
sdiag	Diagnostic tool for Slurm. Shows information related to slurmctld execution.
seff
sgather	Transmit a file from the nodes allocated to a Slurm job.
sh5util	Tool for merging HDF5 files from the acct_gather_profile plugin that gathers detailed data for jobs.
sjobexitmod	Modify derived exit code of a job
strigger	Set, get, or clear Slurm trigger information

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search