Håvard Ose Nordstrand %!s(int64=3) %!d(string=hai) anos
pai
achega
ecb71bc8ea

+ 12 - 1
config/automation/ansible.md

@@ -6,7 +6,7 @@ breadcrumbs:
 ---
 {% include header.md %}
 
-## Modules-ish
+## Resources
 
 ### General Networking
 
@@ -18,4 +18,15 @@ breadcrumbs:
 - [Ansible IOS platform options](https://docs.ansible.com/ansible/latest/network/user_guide/platform_ios.html)
 - [Ansible ios_config module](https://docs.ansible.com/ansible/latest/modules/ios_config_module.html)
 
+## Configuration
+
+Example `/etc/ansible/ansible.cfg`:
+
+```
+[defaults]
+# Change to "auto" if this path causes problems
+interpreter_python = /usr/bin/python3
+host_key_checking = false
+```
+
 {% include footer.md %}

+ 10 - 0
config/general/linux-examples.md

@@ -106,6 +106,16 @@ breadcrumbs:
     - Power save: `echo powersave | ...`
 - Show current core frequencies: `grep "cpu MHz" /proc/cpuinfo | cut -d' ' -f3`
 
+### Profiling
+
+- Command timer (`time`):
+    - Provided both as a shell built-in `time` and as `/usr/bin/time`, use the latter.
+    - Syntax: `/usr/bin/time -vp <command>`
+    - Options:
+        - `-p` for POSIX output (one line per time)
+        - `-v` for interesting system info about the process.
+    - It give the wall time, time spent in usermode and time spent in kernel mode.
+
 ### Security
 
 - Show CPU vulnerabilities: `tail -n +1 /sys/devices/system/cpu/vulnerabilities/*`

+ 78 - 0
config/hpc/containers.md

@@ -0,0 +1,78 @@
+---
+title: Containers
+breadcrumbs:
+- title: Configuration
+- title: High-Performance Computing (HPC)
+---
+{% include header.md %}
+
+## Alternative Technologies
+
+### Docker
+
+#### Resources
+
+- Config notes: [Docker](/config/virt-cont/docker/)
+
+#### General Information
+
+- The de facto container solution.
+- It's generally **not recommended for HPC** (see reasons below), but it's fine for running on a local system if that's just more practical for you.
+- Having access to run containers is effectively the same as having root access to the host machine. This is generally not acceptable on shared resources.
+- The daemon adds extra complexity (and overhead/jitter) not required in HPC scenarios.
+- Generally lacks typical HPC architecture support like batch integration, non-local resources and high-performance interconnects.
+
+### Singularity
+
+#### Resources
+
+- Homepage: [Singularity](https://singularity.hpcng.org/)
+- Config notes: [Singularity](/config/hpc/singularity/)
+
+#### Information
+
+- Images:
+    - Uses a format called SIF, which is file-based (appropriate for parallel/distributed filesystems).
+    - Managing images equates to managing files.
+    - Images can still be pulled from repositories (which will download them as files).
+    - Supports Docker images, but will automatically convert them to SIF.
+- No daemon process.
+- Does not require or provide root access to use.
+- Uses the same user, working directiry and env vars as the host (**TODO** more info required).
+- Supports Slurm.
+- Supports GPU (NVIDIA CUDA and AMD ROCm).
+
+### NVIDIA Enroot
+
+#### Resources
+
+- Homepage: [NVIDIA Enroot](https://github.com/NVIDIA/enroot)
+
+#### Information
+
+- Fully unprivileged `chroot`. Works similarly to typical container technologies, but removes "unnecessary" parts of the isolation mechanisms. Converts traditional container/OS images into "unprivileged sandboxes".
+- Newer than some other alternatives.
+- Supports using Docker images (and Docker Hub).
+- No daemon.
+- Slurm integration using NVIDIA's [Pyxis](https://github.com/NVIDIA/pyxis) SPANK plugin.
+- Support NVIDIA GPUs through NVIDIA's [libnvidia-container](https://github.com/nvidia/libnvidia-container) library and CLI utility.
+    - **TODO** AMD ROCm support?
+
+### Shifter
+
+I've never used it. It's very similar to Singularity.
+
+## Best Practices
+
+- Containers should run as users (the default for e.g. Singularity, but not Docker).
+- Use trusted base images with pinned versions. The same goes for dependencies.
+- Make your own base images with commonly used tools/libs.
+- Datasets and similar do not need to be copied into the image, it can be bind mounted at runtime instead.
+- Spack and Easybuild may be used to simplify building container recipes (Dockerfiles and Singularity-something), to avoid boilerplate and bad practices.
+- BuildAh may be used to build images without a recipe.
+- Dockerfile (or similar) recommendations:
+    - Combine `RUN` commands to reduce the number of layers and this the size of the image.
+    - To exploit the build cache, place the most cacheable commands at the top to avoid running them again on rebuilds.
+    - Use multi-stage builds for separate build and run time images/environments.
+
+{% include footer.md %}

+ 111 - 0
config/hpc/interconnects.md

@@ -0,0 +1,111 @@
+---
+title: Interconnects
+breadcrumbs:
+- title: Configuration
+- title: High-Performance Computing (HPC)
+---
+{% include header.md %}
+
+Using **Debian**, unless otherwise stated.
+
+## Related Pages
+
+- [Linux Switching & Routing](/config/network/linux/)
+
+## General
+
+- The technology should implement RDMA, such that the CPU is not involved in transferring data between hosts (a form of zero-copy). CPU involvement would generally increase latency, increase jitter and limit bandwidth, as well as making the CPU processing power less available to other processing. This implies that the network card must be intelligent/smart and implement hardware offloading for the protocol.
+- The technology should provide a rich communication interface to (userland) applications. The interface should not involve the kernel as unnecessary buffering and context switches would again lead to increased latency, increased jitter, limited bandwidth and excessive CPU usage. Instead of using the TCP/IP stack on top of the interconnect, Infiniband and RoCE (for example) provides "verbs" that the application uses to communicate with applications over the interconnect.
+- The technology should support one-sided communication.
+- OpenFabrics Enterprise Distribution (OFED) is a unified stack for supporting IB, RoCE and iWARP in a unified set of interfaces called the OpenFabrics Interfaces (OFI) to applications. Libfabric is the user-space API.
+- UCX is another unifying stack somewhat similar to OFED. Its UCP API is equivalent to OFED's Libfabric API.
+
+## Ethernet
+
+### Info
+
+- More appropriate for commodify clusters due to Ethernet NICs and switches being available off-the-shelf for lower prices.
+- Support for RoCE (RDMA) is recommended to avoid overhead from the kernel TCP/IP stack (as is typically used wrt. Ethernet).
+- Ethernet-based interconnects include commodity/plain Ethernet, Internet Wide-Area RDMA Protocol (iWARP), Virtual Protocol Interconnect (VPI) (Infiniband and Ethernet on same card), and RDMA over Converged Ethernet (RoCE) (Infiniband running over Ethernet).
+
+## RDMA Over Converged Ethernet (RoCE)
+
+### Info
+
+- Link layer is Converged Ethernet (CE) (aka data center bridging (DCB)), but the upper protocols are Infiniband (IB).
+- v1 uses an IB network layer and is limited to a single broadcast domain, but v2 uses a UDP/IP network layer (and somewhat transport layer) and is routable over IP routers. Both use an IB transport layer.
+- RoCE requires the NICs and switches to support it.
+- It performs very similar to Infiniband, given equal hardware.
+
+## InfiniBand
+
+### Info
+
+- Each physical connection uses a specified number of links/lanes (typically x4), such that the throughput is aggregated.
+- Per-lane throughput:
+    - SDR: 2Gb/s
+    - DDR: 4Gb/s
+    - QDR: 8Gb/s
+    - FDR10: 10Gb/s
+    - FDR: 13.64Gb/s
+    - EDR: 25Gb/s
+    - HDR: 50Gb/s
+    - NDR: 100Gb/s
+    - XDR: 250Gb/s
+- The network adapter is called a host channel adapter (HCA).
+- It's typically switches, but supports routing between subnets as well.
+- Channel endpoints between applications a called queue pairs (QPs).
+- To avoid invoking the kernel when communicating over a channel, the kernel allocates and pins a memory region that the userland application and the HCA can both access without further kernel involvement. A local key is used by the application to access the HCA buffers and an unencrypted remote key is used by the remote host to access the HCA buffers.
+- Communication uses either channel semantics (the send/receive model, two-sided) or memory semantics (RDMA model, one-sided). It also supports a special type of memory semantics using atomic operations, which is a useful foundation for e.g. distributed locks.
+- Each subnet requires a subnet manager to be running on a switch or a host, which manages the subnet and is queryable by hosts (agents). For very large subnets, it may be appropriate to run it on a dedicated host. It assigns addresses to endpoints, manages routing tables and more.
+
+### Installation
+
+1. Install RDMA: `apt install rdma-core`
+1. Install user-space RDMA stuff: `apt install ibverbs-providers rdmacm-utils infiniband-diags ibverbs-utils`
+1. Install subnet manager (SM): `apt install opensm`
+    - Only one instance is required on the network, but multiple may be used for redundancy.
+    - A master SM is selected based on configured priority, with GUID as a tie breaker.
+1. Setup IPoIB:
+    - Just like for Ethernet. Just specify the IB interface as the L2 device.
+    - Use an appropriate MTU like 2044.
+1. Make sure ping and ping-pong is working (see examples below).
+
+### Usage
+
+- Show IPoIB status: `ip a`
+- Show local devices:
+    - GUIDs: `ibv_devices`
+    - Basics (1): `ibstatus`
+    - Basics (2): `ibstat`
+    - Basics (3): `ibv_devinfo`
+- Show link statuses for network: `iblinkinfo`
+- Show subnet nodes:
+    - Hosts: `ibhosts`
+    - Switches: `ibswitches`
+    - Routers: `ibrouters`
+- Show active subnet manager(s): `sminfo`
+- Show subnet topology: `ibnetdiscover`
+- Show port counters: `perfquery`
+
+#### Testing
+
+- Ping:
+    - Server: `ibping -S`
+    - Client: `ibping -G <guid>`
+- Ping-pong:
+    - Server: `ibv_rc_pingpong -d <device> [-n <iters>]`
+    - Client: `ibv_rc_pingpong [-n <iters>] <ip>`
+- Other tools:
+    - qperf
+    - perftest
+- Diagnose with ibutils:
+    - Requires the `ibutils` package.
+    - Diagnose fabric: `ibdiagnet -ls 10 -lw 4x` (example)
+    - Diagnose path between two nodes: `ibdiagpath -l 65,1` (example)
+
+## NVLink & NVSwitch
+
+See [CUDA](/se/general/cuda/).
+
+{% include footer.md %}

+ 53 - 0
config/hpc/singularity.md

@@ -0,0 +1,53 @@
+---
+title: Singularity
+breadcrumbs:
+- title: Configuration
+- title: High-Performance Computing (HPC)
+---
+{% include header.md %}
+
+A container technology for HPC.
+
+## Information
+
+- For more general imformation and comparison to other HPC container technologies, see [Containers](/config/hpc/containers/).
+
+## Configuration
+
+**TODO**
+
+## Usage
+
+### Running
+
+- Run command: `singularity exec <opts> <img> <cmd>`
+- Run interactive shell: `singularity shell <opts> <img>`
+- Mounts:
+    - The current directory is mounted and used as the working directory by default.
+- Env vars:
+    - Env vars are copied from the host, but `--cleanenv` may be used to avoid that.
+    - Extra can be specified using `--env <var>=<val>`.
+- GPUs:
+    - See extra notes.
+    - Specify `--nv` (NVIDIA) or `--rocm` (AMD) to expose GPUs.
+
+### Images
+
+- Pull image from repo:
+    - Will place the image as a SIF file (`<image>_<tag>.sif`) in the current directory.
+    - Docker Hub: `singularity pull docker://<img>:<tag>`
+
+### GPUs
+
+- The GPU driver library must be exposed in the container and `LD_LIBRARY_PATH` must be updated.
+- Specify `--nv` (NVIDIA) or `--rocm` (AMD) when running a container.
+
+### MPI
+
+- Using the "bind approach", where MPI and the interconnect is bind mounted into the container.
+- MPI is installed in the container in order to build the application with dynamic linking.
+- MPI is installed on the host such that the application can dynamically load it at run time.
+- The MPI implementations must be of the same family and preferably the same version (for ABI compatibility). While MPICH, IntelMPI, MVAPICH and CrayMPICH use the same ABI, Open MPI does not comply with that ABI.
+- When running the application, both the MPI implementation and the interconnect must be bind mounted into the container and and appropriate `LD_LIBRARY_PATH` must be provided for the MPI libraries. This may be statically configured by the system admin.
+
+{% include footer.md %}

+ 7 - 14
config/linux-server/debian.md

@@ -108,10 +108,9 @@ The first steps may be skipped if already configured during installation (i.e. n
     - Fix YAML formatting globally: In `/etc/vim/vimrc.local`, add `autocmd FileType yaml setlocal ts=2 sts=2 sw=2 expandtab`.
 1. Add mount options:
     - Setup hidepid:
-        - **TODO** Use existing `adm` group instead of creating a new one?
-        - Add PID monitor group: `groupadd -g 500 hidepid` (example GID)
-        - Add your personal user to the PID monitor group: `usermod -aG hidepid <user>`
-        - Enable hidepid in `/etc/fstab`: `proc /proc proc defaults,hidepid=2,gid=500 0 0`
+        - Note: The `adm` group will be granted access.
+        - Add your personal user to the PID monitor group: `usermod -aG adm <user>`
+        - Enable hidepid in `/etc/fstab`: `proc /proc proc defaults,hidepid=2,gid=<adm-gid> 0 0` (using the numerical GID of `adm`)
     - (Optional) Disable the tiny swap partition added by the guided installer by commenting it in the fstab.
     - (Optional) Setup extra mount options: See [Storage](system.md).
     - Run `mount -a` to validate fstab.
@@ -121,7 +120,7 @@ The first steps may be skipped if already configured during installation (i.e. n
     - Add the relevant groups (using `usermod -aG <group> <user>`):
         - `sudo` for sudo access.
         - `systemd-journal` for system log access.
-        - `hidepid` (whatever it's called) if using hidepid, to see all processes.
+        - `adm` for hidepid, to see all processes (if using hidepid).
     - Add your personal SSH pubkey to `~/.ssh/authorized_keys` and fix the owner and permissions (700 for dir, 600 for file).
         - Hint: Get `https://github.com/<user>.keys` and filter the results.
     - Try logging in remotely and gain root access through sudo.
@@ -230,12 +229,7 @@ Prevent enabled (and potentially untrusted) interfaces from accepting router adv
     - (Optional) `DNSSEC`: Set to `no` to disable (only if you have a good reason to, like avoiding the chicken-and-egg problem with DNSSEC and NTP).
 1. (Optional) If you're hosting a DNS server on this machine, set `DNSStubListener=no` to avoid binding to port 53.
 1. Enable the service: `systemctl enable --now systemd-resolved.service`
-1. Fix `/etc/resolv.conf`:
-    - Note: The systemd-generated one is `/run/systemd/resolve/stub-resolv.conf`.
-    - Note: Simply symlinking `/etc/resolv.conf` to the systemd one will cause dhclient to overwrite it if using DHCP for any interfaces, so don't do that.
-    - Note: This method may cause `/etc/resolv.conf` to become outdated if the systemd one changes for some reason (e.g. if the search domains change).
-    - After configuring and starting resolved, copy (not link) `resolv.conf`: `cp /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf`
-    - Make it immutable so dhclient can't update it: `chattr +i /etc/resolv.conf`
+1. Link `/etc/resolv.conf`: `ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf`
 1. Check status: `resolvectl`
 
 ##### Using resolv.conf (Alternative 2)
@@ -308,12 +302,11 @@ Everything here is optional.
     - Install: `apt install lynis`
     - Run: `lynis audit system`
 - MOTD:
-    - Clear `/etc/motd` and `/etc/issue`.
-    - Download [dmotd.sh](https://github.com/HON95/scripts/blob/master/server/linux/general/dmotd.sh) to `/etc/profile.d/`.
+    - Clear `/etc/motd`, `/etc/issue` and `/etc/issue.net`.
+    - Download [dmotd.sh](https://github.com/HON95/scripts/blob/master/linux/login/dmotd.sh) to `/etc/profile.d/`.
     - Install the dependencies: `neofetch lolcat`
     - Add an ASCII art (or Unicode art) logo to `/etc/logo`, using e.g. [TAAG](http://patorjk.com/software/taag/).
     - (Optional) Add a MOTD to `/etc/motd`.
-    - (Optional) Clear or change the pre-login message in `/etc/issue`.
     - Test it: `su - <some-normal-user>`
 - Setup monitoring:
     - Use Prometheus with node exporter or something and set up alerts.

+ 0 - 69
config/linux-server/networking.md

@@ -1,69 +0,0 @@
----
-title: Linux Server Networking
-breadcrumbs:
-- title: Configuration
-- title: Linux Server
----
-{% include header.md %}
-
-Using **Debian**, unless otherwise stated.
-
-### TODO
-{:.no_toc}
-
-- Migrate stuff from Debian page.
-- Add link to Linux router page. Maybe combine.
-- Add ethtool notes from VyOS.
-
-## Related Pages
-
-- [Linux Switching & Routing](/config/network/linux/)
-
-## InfiniBand
-
-### Installation
-
-1. Install RDMA: `apt install rdma-core`
-1. Install user-space RDMA stuff: `apt install ibverbs-providers rdmacm-utils infiniband-diags ibverbs-utils`
-1. Install subnet manager (SM): `apt install opensm`
-    - Only one instance is required on the network, but multiple may be used for redundancy.
-    - A master SM is selected based on configured priority, with GUID as a tie breaker.
-1. Setup IPoIB:
-    - Just like for Ethernet. Just specify the IB interface as the L2 device.
-    - Use an appropriate MTU like 2044.
-1. Make sure ping and ping-pong is working (see examples below).
-
-### Usage
-
-- Show IPoIB status: `ip a`
-- Show local devices:
-    - GUIDs: `ibv_devices`
-    - Basics (1): `ibstatus`
-    - Basics (2): `ibstat`
-    - Basics (3): `ibv_devinfo`
-- Show link statuses for network: `iblinkinfo`
-- Show subnet nodes:
-    - Hosts: `ibhosts`
-    - Switches: `ibswitches`
-    - Routers: `ibrouters`
-- Show active subnet manager(s): `sminfo`
-- Show subnet topology: `ibnetdiscover`
-- Show port counters: `perfquery`
-
-#### Testing
-
-- Ping:
-    - Server: `ibping -S`
-    - Client: `ibping -G <guid>`
-- Ping-pong:
-    - Server: `ibv_rc_pingpong -d <device> [-n <iters>]`
-    - Client: `ibv_rc_pingpong [-n <iters>] <ip>`
-- Other tools:
-    - qperf
-    - perftest
-- Diagnose with ibutils:
-    - Requires the `ibutils` package.
-    - Diagnose fabric: `ibdiagnet -ls 10 -lw 4x` (example)
-    - Diagnose path between two nodes: `ibdiagpath -l 65,1` (example)
-
-{% include footer.md %}

+ 3 - 1
index.md

@@ -40,8 +40,11 @@ Random collection of config notes and miscellaneous stuff. _Technically not a wi
 ### HPC
 
 - [Slurm Workload Manager](/config/hpc/slurm/)
+- [Containers](/config/hpc/containers/)
+- [Singularity](/config/hpc/singularity/)
 - [CUDA](/config/hpc/cuda/)
 - [Open MPI](/config/hpc/openmpi/)
+- [Interconnects](/config/hpc/interconnects/)
 
 ### IoT & Home Automation
 
@@ -55,7 +58,6 @@ Random collection of config notes and miscellaneous stuff. _Technically not a wi
 - [Storage](/config/linux-server/storage/)
 - [Storage: ZFS](/config/linux-server/storage-zfs/)
 - [Storage: Ceph](/config/linux-server/storage-ceph/)
-- [Networking](/config/linux-server/networking/)
 
 ### Media
 

+ 13 - 1
se/hpc/cuda.md

@@ -243,7 +243,19 @@ breadcrumbs:
 
 ### Nsight Compute
 
-- May be run from command line (`ncu`) or using the graphical application.
+- May be run from command line (`ncu`) or using the graphical application (`ncu-ui`).
 - Kernel replays: In order to run all profiling methods for a kernel execution, Nsight might have to run the kernel multiple times by storing the state before the first kernel execution and restoring it for every replay. It does not restore any host state, so in case of host-device communication during the execution, this is likely to put the application in an inconsistent state and cause it to crash or give incorrect results. To rerun the whole application (aka "application mode") instead of transparently replaying individual kernels (aka "kernel mode"), specify `--replay-mode=application` (or the equivalent option in the GUI).
 
+## Hardware
+
+### NVLink & NVSwitch
+
+- Interconnect for connecting NVIDIA GPUs and NICs/HCAs as a mesh within a node, because PCIe was too limited.
+- NVLink alone is limited to only eight GPUs, but NVSwitches allows connecting more.
+- A bidirectional "link" consists of two unidirectional "sub-links", which each contain eight differential pairs (i.e. lanes). Each device may support multiple links.
+- NVLink transfer rate per differential pair:
+    - NVLink 1.0 (Pascal): 20Gb/s
+    - NVLink 2.0 (Volta): 25Gb/s
+    - NVLink 3.0 (Ampere): 50Gb/s
+
 {% include footer.md %}