Håvard Ose Nordstrand před 4 roky
rodič
revize
42c2e5f37d
3 změnil soubory, kde provedl 18 přidání a 6 odebrání
  1. 12 1
      config/hpc/openmpi.md
  2. 5 4
      config/hpc/slurm.md
  3. 1 1
      index.md

+ 12 - 1
config/hpc/openmpi.md

@@ -34,7 +34,6 @@ A Message Passing Interface (MPI) implementation for C, Fortran, Java, etc.
 ## Run
 
 - Run: `mpirun [opts] <app> [app_opts]`
-    - On certain Slurm clusters it's advised to use `srun` or `srun --mpi=pmix` instead.
 - Set number of processes to use: `-n <n>`
 - Allow more processes than physical cores: `--oversubscribe`
 - Allow running as root (discouraged): `--allow-run-as-root`
@@ -44,4 +43,16 @@ A Message Passing Interface (MPI) implementation for C, Fortran, Java, etc.
     - Specify BTLs exactly: `--mca btl self,vader,tcp`
     - *Something* (**TODO**): `--mca pml ob1`
 
+### Slurm
+
+This applies to cluster using the Slurm workload manager.
+
+- `srun` may be used instead of `mpirun` to let Slurm run the tasks directly through the PMI2/PMIx APIs.
+    - Unlike `mpirun`, this defaults to 1 process per node.
+    - Specify `--mpi={pmi2|pmix}` to explicitly use PMI2 or PMIx.
+
+## Miscellanea
+
+- `PMI_SIZE` and `PMI_RANK` for PMI2, or `PMIX_RANK` for PMIx (no `PMIX_SIZE`), may be used to get the MPI works size and rank.
+
 {% include footer.md %}

+ 5 - 4
config/hpc/slurm.md

@@ -1,8 +1,8 @@
 ---
-title: Slurm Workload Manager
+title: HPC Cluster
 breadcrumbs:
 - title: Configuration
-- title: IoT
+- title: High-Performance Computing (HPC)
 ---
 {% include header.md %}
 
@@ -14,6 +14,7 @@ breadcrumbs:
     - Show partitions: `scontrol show partition [-a]`
     - Show partition/node usage: `sinfo [-a]`
     - Show node capabilities: `sinfo -o "%20N    %8c    %10m    %25f    %10G"` (example)
+    - Show GUI (requires X11 session/forwarding): `sview`
 - Accounting:
     - Show accounts for user: `sacctmgr show assoc where user=<username> format=account`
     - Show default account for user: `sacctmgr show user <username> format=defaultaccount`
@@ -28,7 +29,7 @@ breadcrumbs:
     - Cancel specific job: `scancel <jobid>`
     - Cancel set of jobs: `scancel [-t <state>] [-u <user>]`
 
-## Example Slurm-File
+### Example Slurm Job File
 
 ```sh
 #!/bin/sh
@@ -46,7 +47,7 @@ breadcrumbs:
 ## SBATCH --mail-user=user@example.net
 # #SBATCH --mail-type=ALL
 
-# Run some program on all processors (or use mpirun)
+# Run some program on one processor on all nodes
 srun uname -a
 ```
 

+ 1 - 1
index.md

@@ -39,9 +39,9 @@ Random collection of config notes and miscellaneous stuff. _Technically not a wi
 
 ### HPC
 
+- [Slurm Workload Manager](/config/hpc/slurm/)
 - [CUDA](/config/hpc/cuda/)
 - [Open MPI](/config/hpc/openmpi/)
-- [Slurm Workload Manager](/config/hpc/slurm/)
 
 ### IoT & Home Automation