--- title: HPC Cluster breadcrumbs: - title: High-Performance Computing (HPC) --- {% include header.md %} ## Usage ### Cluster Information and Accounting - Show partitions: `scontrol show partition [-a]` - Show partition/node usage: `sinfo [-a]` - Show node capabilities: `sinfo -o "%20N %8c %10m %25f %25G"` (example) - Show GUI (requires X11 session/forwarding): `sview` - Show accounts for user: `sacctmgr show assoc where user= format=account` - Show default account for user: `sacctmgr show user format=defaultaccount` ### Job Submission and Queueing - Create a job (overview): Make a Slurm script, make it executable and submit it. - Using GPUs: See example Slurm-file, using `--gres=gpu[:]:`. - Submit batch/non-blocking job: `sbatch ` - Start interactive/blocking job: `srun [--pty] ` - Cancel specific job: `scancel ` - Cancel set of jobs: `scancel [-t ] [-u ]` - Show job queue: `squeue [-u ] [-t ] [-p ]` - Show job details: `scontrol show jobid -dd ` ### Example Slurm Job File ```sh #!/bin/sh #SBATCH --partition= #SBATCH --time=03:00:00 #SBATCH --nodes=2 # #SBATCH --nodelist=compute-2-0-[17-18],compute-5-0-[20-21] #SBATCH --ntasks-per-node=2 # #SBATCH --exclusive # #SBATCH --mem=64G #SBATCH --gres=gpu:V100:2 #SBATCH --job-name="xxx" #SBATCH --output=log.txt ## SBATCH --mail-user=user@example.net # #SBATCH --mail-type=ALL # Run some program on one processor on all nodes srun uname -a ``` {% include footer.md %}