jeremy/HON95-wiki @ a6ceed6bfa3aa1412b5cde46947a623e554a460c

title: Slurm breadcrumbs:

title: High-Performance Computing (HPC) --- {% include header.md %}

Usage

Cluster Information and Accounting

Show partitions: scontrol show partition [-a]
Show partition/node usage: sinfo [-a]
Show node capabilities: sinfo -o "%20N %10A %8c %10m %25f %25G" (example)
Show GUI (requires X11 session/forwarding): sview
Show accounts for user: sacctmgr show assoc where user=<username> format=account
Show default account for user: sacctmgr show user <username> format=defaultaccount

Job Submission and Queueing

Create a job (overview): Make a Slurm script, make it executable and submit it.
Using GPUs: See example Slurm-file, using --gres=gpu[:<type>]:<n>.
Submit batch job: sbatch <slurm-file>
Start interactive job: srun <job options> [--pty] <bash|app>
Create allocation (without connecting to it): salloc <job options>
- Use e.g. srun --jobid=<id> --pty bash to connect to it.
Cancel specific job: scancel <jobid>
Cancel set of jobs: scancel [-t <state>] [-u <user>]
Show job queue: squeue [-u <user>] [-t <state>] [-p <partition>]
Show job details: scontrol show jobid -dd <jobid>

Example Slurm Job File

#!/bin/sh

#SBATCH --partition=<partition>
#SBATCH --time=03:00:00
#SBATCH --nodes=2
# #SBATCH --nodelist=compute-2-0-[17-18],compute-5-0-[20-21]
#SBATCH --ntasks-per-node=2
# #SBATCH --exclusive
# #SBATCH --mem=64G
#SBATCH --gres=gpu:V100:2
#SBATCH --job-name="xxx"
#SBATCH --output=log.txt
## SBATCH --mail-user=user@example.net
# #SBATCH --mail-type=ALL

# Run some program on one processor on all nodes
srun uname -a

{% include footer.md %}

slurm.md 1.7 KB Cronologia Originale

Usage

Cluster Information and Accounting

Job Submission and Queueing

Example Slurm Job File

slurm.md 1.7 KB

Cronologia Originale