فهرست منبع

Merge branch 'master' of github.com:HON95/wiki

Håvard O. Nordstrand 3 سال پیش
والد
کامیت
b3d5a2ebbc

+ 12 - 1
config/automation/ansible.md

@@ -6,7 +6,7 @@ breadcrumbs:
 ---
 {% include header.md %}
 
-## Modules-ish
+## Resources
 
 ### General Networking
 
@@ -18,4 +18,15 @@ breadcrumbs:
 - [Ansible IOS platform options](https://docs.ansible.com/ansible/latest/network/user_guide/platform_ios.html)
 - [Ansible ios_config module](https://docs.ansible.com/ansible/latest/modules/ios_config_module.html)
 
+## Configuration
+
+Example `/etc/ansible/ansible.cfg`:
+
+```
+[defaults]
+# Change to "auto" if this path causes problems
+interpreter_python = /usr/bin/python3
+host_key_checking = false
+```
+
 {% include footer.md %}

+ 9 - 1
config/general/computer-testing.md

@@ -32,10 +32,18 @@ breadcrumbs:
 Example usage:
 
 ```sh
-# 1 stressor, 75% of memory, with verification, for 10 minutes
+# 1 stressor, 75% of memory (TODO this also works fine with 100% for some reason, find out what it actually means), with verification, for 10 minutes
 stress-ng --vm 1 --vm-bytes 75% --vm-method all --verify -t 10m -v
 ```
 
+### Error Detection and Correction (EDAC) (Linux)
+
+- Only available on systems with ECC RAM, AFAIK.
+- Check the syslog: `journalctl | grep 'EDAC' | grep -i 'error'`
+- Show corrected (CE) and uncorrected (UE) errors per memory controller and DIMM slot: `grep '.*' /sys/devices/system/edac/mc/mc*/dimm*/dimm_*_count`
+- Show DIMM slot names to help locate the faulty DIMM: `dmidecode -t memory | grep 'Locator:.*DIMM.*'`
+- When changing the DIMM, make sure to run Memtest86 or similar both before and after to validate that the errors go away.
+
 ## Storage
 
 ### Fio (Linux)

+ 92 - 6
config/general/linux-examples.md

@@ -8,6 +8,53 @@ breadcrumbs:
 
 ## Commands
 
+### General Monitoring
+
+- For more specific monitoring, see the other sections.
+- `htop`:
+    - ncurses-based process viewer like `top`, but prettier and more interactive.
+    - Install (APT): `apt install htop`
+    - Usage: `htop` (interactive)
+- `glances`:
+    - Homepage: [Glances](https://nicolargo.github.io/glances/)
+    - Install (PyPI for latest version): `pip3 install glances`
+    - ncurses-based viewer for e.g. basic system info, top-like process info, network traffic and disk traffic.
+    - Usage: `glances` (interactive)
+- `dstat`:
+    - A versatile replacement for vmstat, iostat and ifstat (according to itself).
+    - Prints scrolling output for showing a lot of types of general metrics, one line of columns for each time step.
+    - Usage: `dstat <options> [interval] [count]`
+        - Default interval is 1s, default count is unlimited.
+        - The values shown are the average since the last interval ended.
+        - For intervals over 1s, the last row will update itself each second until the delay has been reached and a new line is created. The values shown are averages since the last final value (when the last line was finalized), so e.g. a 10s interval gives a final line showing a 10s average.
+        - The first line is always a snapshot, i.e. all rate-based metrics are 0 or some absolute value.
+        - If any column options are provided, they will replace the default ones and are displayed in the order specified.
+    - Special options:
+        - `-C <>`: Comma-separated list of CPUs/cores to show for, including `total`.
+        - `-D <>`: Same but for disks.
+        - `-N <>`: Same but for NICs.
+        - `-f`: Show stats for all devices (not aggregated).
+    - Useful metrics:
+        - `-t`: Current time.
+        - `-p`: Process stats (by runnable, uninterruptible, new) (changes per second).
+        - `-y`: Total interrupt and context switching stats (by interrupts, context switches) (events per second).
+        - `-l`: Load average stats (1 min, 5 mins, 15 mins) (total system load multiplied by number of cores).
+        - `-c`: CPU stats (by system, user, idle, wait) (percentage of total).
+        - `--cpu-use`: Per-CPU usage (by CPU) (percentage).
+        - `-m`: Memory stats (by used, buffers, cache, free) (bytes).
+        - `-g`: Paging stats (by in, out) (count per second).
+        - `-s`: Swap stats (by used, free) (total).
+        - `-r`: Storage request stats (by read, write) (requests per second).
+        - `-d`: Storage throughput stats (by read, write) (bytes per second).
+        - `-n`: Network throughput stats (by recv, send) (bytes per second).
+        - `--socket`: Network socket stats (by total, tcp, udp, raw, ip-fragments)
+    - Useful plugins (metrics):
+        - `--net-packets`: Network request stats (by recv, send) (packets per second).
+    - Examples:
+        - General overview (CPU, RAM, ints/csws, disk, net): `dstat -tcmyrdn --net-packets 60`
+        - Network overview (CPU, ints/csws, net): `dstat -tcyn --net-packets 60`
+        - Process overview (CPU, RAM, ints/csws, paging, process, sockets): `dstat -tcmygp --socket 60`
+
 ### File Systems and Logical Volume Managers
 
 - Partition disk: `gdisk <dev>` or `fdisk <dev>`
@@ -81,7 +128,7 @@ breadcrumbs:
 - Show sockets:
     - `netstat -tulpn`
         - `tu` for TCP and UDP, `l` for listening, `p` for protocol, `n` for numerical post numbers.
-    - `ss <options>`
+    - `ss -tulpn` (replaces netstat version)
 - Show interface stats:
     - `ip -s link`
     - `netstat -i`
@@ -99,6 +146,31 @@ breadcrumbs:
     - `nstat`
     - `netstat -s` (statistics)
 
+#### Tcpdump
+
+- Typical usage: `tcpdump -i <interface> -nn -v [filter]`
+- Options:
+    - `-w <>.pcap`: Write to capture file instead of formatted to STDOUT.
+    - `-i <if>`: Interface to listen on. Defaults to a random-ish interface.
+    - `-nn`: Don't resolve hostnames or ports.
+    - `-s<n>`: How much of the packets to capture. Use 0 for unlimited (full packet).
+    - `-v`/`-vv`: Details to show about packets. More V's for more details.
+    - `-l`: Line buffered more, for better stability when piping to e.g. grep.
+- Filters:
+    - Can consist of complex logical statements using parenthesis, `not`/`!`, `and`/`&&` and `or`/`||`. Make sure to quote the filter to avoid interference from the shell.
+    - Protocol: `ip`, `ip6`, `icmp`, `icmp6`, `tcp`, `udp`, ``
+    - Ports: `port <n>`
+    - IP address: `host <addr>`, `dst <addr>`, `src <addr>`
+    - IPv6 router solicitations and advertisements: `icmp6 and (ip6[40] = 133 or ip6[40] = 134)` (133 for RS and 134 for RA)
+    - IPv6 neighbor solicitations and advertisements: `icmp6 and (ip6[40] = 135 or ip6[40] = 136)` (135 for NS and 136 for NA)
+    - DHCPv4: `ip and udp and (port 67 and port 68)`
+    - DHCPv6: `ip6 and udp and (port 547 and port 546)`
+
+### Memory
+
+- NUMA stats:
+    - `numastat` (from package `numactl`)
+
 ### Performance and Power Efficiency
 
 - Set the CPU frequency scaling governor mode:
@@ -106,6 +178,23 @@ breadcrumbs:
     - Power save: `echo powersave | ...`
 - Show current core frequencies: `grep "cpu MHz" /proc/cpuinfo | cut -d' ' -f3`
 
+### Profiling
+
+- `time` (timing commands):
+    - Provided both as a shell built-in `time` and as `/usr/bin/time`, use the latter.
+    - Typical usage: `/usr/bin/time -p <command>`
+    - Options:
+        - `-p` for POSIX output (one line per time)
+        - `-v` for interesting system info about the process.
+    - It give the wall time, time spent in usermode and time spent in kernel mode.
+- `strace` (trace system calls and signals):
+    - In standard mode, it runs the full command and traces/prints all syscalls (including arguments and return value).
+    - Syntax: `strace [options] <command>`
+    - Useful options:
+        - `-c`: Show summary/overview only. (Hints at which syscalls are worth looking more into.)
+        - `-f`: Trace forked child processes too.
+        - `-e trace=<syscalls>`: Only trace the specified comma-separated list of syscalls.
+
 ### Security
 
 - Show CPU vulnerabilities: `tail -n +1 /sys/devices/system/cpu/vulnerabilities/*`
@@ -133,14 +222,11 @@ breadcrumbs:
     - `iostat [-c] [-t] [interval]`
 - Monitor processes:
     - `ps` (e.g. `ps aux` or `ps ax o uid,user:12,pid,comm`)
-- Monitor a mix of things:
-    - `htop`
-    - `glances`
-    - `ytop`
+- Monitor a mix of things: See the "general monitoring" section.
 - Monitor interrupts:
     - `irqtop`
     - `watch -n0.1 /proc/interrupts`
-- Stress test with stress-mg:
+- Stress test with stress-ng:
     - Install (Debian): `apt install stress-ng`
     - Stress CPU: `stress-ng -c $(nproc) -t 600`
 

+ 77 - 0
config/hpc/containers.md

@@ -0,0 +1,77 @@
+---
+title: Containers
+breadcrumbs:
+- title: Configuration
+- title: High-Performance Computing (HPC)
+---
+{% include header.md %}
+
+## Alternative Technologies
+
+### Docker
+
+#### Resources
+
+- Config notes: [Docker](/config/virt-cont/docker/)
+
+#### General Information
+
+- The de facto container solution.
+- It's generally **not recommended for HPC** (see reasons below), but it's fine for running on a local system if that's just more practical for you.
+- Having access to run containers is effectively the same as having root access to the host machine. This is generally not acceptable on shared resources.
+- The daemon adds extra complexity (and overhead/jitter) not required in HPC scenarios.
+- Generally lacks typical HPC architecture support like batch integration, non-local resources and high-performance interconnects.
+
+### Singularity
+
+#### Resources
+
+- Homepage: [Singularity](https://singularity.hpcng.org/)
+- Config notes: [Singularity](/config/hpc/singularity/)
+
+#### Information
+
+- Images:
+    - Uses a format called SIF, which is file-based (appropriate for parallel/distributed filesystems).
+    - Managing images equates to managing files.
+    - Images can still be pulled from repositories (which will download them as files).
+    - Supports Docker images, but will automatically convert them to SIF.
+- No daemon process.
+- Does not require or provide root access to use.
+- Uses the same user, working directiry and env vars as the host (**TODO** more info required).
+- Supports Slurm.
+- Supports GPU (NVIDIA CUDA and AMD ROCm).
+
+### NVIDIA Enroot
+
+#### Resources
+
+- Homepage: [NVIDIA Enroot](https://github.com/NVIDIA/enroot)
+
+#### Information
+
+- Fully unprivileged `chroot`. Works similarly to typical container technologies, but removes "unnecessary" parts of the isolation mechanisms. Converts traditional container/OS images into "unprivileged sandboxes".
+- Newer than some other alternatives.
+- Supports using Docker images (and Docker Hub).
+- No daemon.
+- Slurm integration using NVIDIA's [Pyxis](https://github.com/NVIDIA/pyxis) SPANK plugin.
+- Support NVIDIA GPUs through NVIDIA's [libnvidia-container](https://github.com/nvidia/libnvidia-container) library and CLI utility (_official_ support from NVIDIA unlike certain other solutions).
+
+### Shifter
+
+I've never used it. It's apparently very similar to Singularity.
+
+## Best Practices
+
+- Containers should run as users (the default for e.g. Singularity, but not Docker).
+- Use trusted base images with pinned versions. The same goes for dependencies.
+- Make your own base images with commonly used tools/libs.
+- Datasets and similar do not need to be copied into the image, it can be bind mounted at runtime instead.
+- Spack and Easybuild may be used to simplify building container recipes (Dockerfiles and Singularity-something), to avoid boilerplate and bad practices.
+- BuildAh may be used to build images without a recipe.
+- Dockerfile (or similar) recommendations:
+    - Combine `RUN` commands to reduce the number of layers and this the size of the image.
+    - To exploit the build cache, place the most cacheable commands at the top to avoid running them again on rebuilds.
+    - Use multi-stage builds for separate build and run time images/environments.
+
+{% include footer.md %}

+ 9 - 7
config/hpc/cuda.md

@@ -11,7 +11,8 @@ NVIDIA CUDA (Compute Unified Device Architecture) Toolkit, for programming CUDA-
 ### Related Pages
 {:.no_toc}
 
-- [CUDA (software engineering)](/config/se/general/cuda.md)
+- [HIP](/config/hpc/hip/)
+- [CUDA (software engineering)](/se/general/cuda/)
 
 ## Resources
 
@@ -22,7 +23,7 @@ NVIDIA CUDA (Compute Unified Device Architecture) Toolkit, for programming CUDA-
 
 ## Setup
 
-### Linux
+### Linux Installation
 
 The toolkit on Linux can be installed in different ways:
 
@@ -34,19 +35,20 @@ If an NVIDIA driver is already installed, it must match the CUDA version.
 
 Downloads: [CUDA Toolkit Download (NVIDIA)](https://developer.nvidia.com/cuda-downloads)
 
-#### Ubuntu (NVIDIA CUDA Repo)
+#### Ubuntu w/ NVIDIA's CUDA Repo
 
 1. Follow the steps to add the NVIDIA CUDA repo: [CUDA Toolkit Download (NVIDIA)](https://developer.nvidia.com/cuda-downloads)
     - But don't install `cuda` yet.
-1. Remove anything NVIDIA or CUDA from the system to avoid conflicts: `apt purge --autoremove cuda nvidia-* libnvidia-*`
+1. Remove anything NVIDIA or CUDA from the system to avoid conflicts: `apt purge --autoremove 'cuda' 'cuda-' 'nvidia-*' 'libnvidia-*'`
     - Warning: May break your PC. There may be better ways to do this.
 1. Install CUDA from the new repo (includes the NVIDIA driver): `apt install cuda`
-1. Setup path: In `/etc/environment`, append `:/usr/local/cuda/bin` to the end of the PATH list.
+1. Setup PATH: `echo 'export PATH=$PATH:/usr/local/cuda/bin' | sudo tee -a /etc/profile.d/cuda.sh`
 
 ### Docker Containers
 
-- Docker containers may run NVIDIA applications using the NVIDIA runtime for Docker.
-- **TODO**
+Docker containers may run NVIDIA applications using the NVIDIA runtime for Docker.
+
+See [Docker](/config/virt-cont/docker/).
 
 ### DCGM
 

+ 33 - 0
config/hpc/enroot.md

@@ -0,0 +1,33 @@
+---
+title: Enroot
+breadcrumbs:
+- title: Configuration
+- title: High-Performance Computing (HPC)
+---
+{% include header.md %}
+
+A container technology for HPC, made by NVIDIA.
+
+## Information
+
+- For more general imformation and comparison to other HPC container technologies, see [Containers](/config/hpc/containers/).
+
+## Configuration
+
+**TODO**
+
+## Usage
+
+### Running
+
+- **TODO**
+
+### Images
+
+- **TODO**
+
+### GPUs
+
+- **TODO**
+
+{% include footer.md %}

+ 67 - 0
config/hpc/hip.md

@@ -0,0 +1,67 @@
+---
+title: HIP
+breadcrumbs:
+- title: Configuration
+- title: High-Performance Computing (HPC)
+---
+{% include header.md %}
+
+HIP (**TODO** abbreviation) is AMD ROCm's runtime API and kernel language, which is compilable for both AMD (through ROCm) and NVIDIA (through CUDA) GPUs.
+Compared to OpenCL (which is also supported by both NVIDIA and AMD), it's much more similar to CUDA (making it _very_ easy to port CUDA code) and allows using existing profiling tools and similar for CUDA and ROCm.
+
+### Related Pages
+{:.no_toc}
+
+- [ROCm](/config/hpc/rocm/)
+- [CUDA](/config/hpc/cuda/)
+
+## Resources
+
+- [HIP Installation (AMD ROCm Docs)](https://rocmdocs.amd.com/en/latest/Installation_Guide/HIP-Installation.html)
+
+## Info
+
+- HIP code can be compiled for AMD ROCm using the HIP-Clang compiler or for CUDA using the NVCC compiler.
+- If using both CUDA with an NVIDIA GPU and ROCm with an AMD GPU in the same system, HIP seems to prefer ROCm with the AMD GPU when building application. I found not way of changing the target platform (**TODO**).
+
+## Setup
+
+### Linux Installation
+
+Using **Ubuntu 20.04 LTS**.
+
+#### Common Steps Before
+
+1. Add the ROCm package repo (overlaps with ROCm installation):
+    1. Install requirements: `sudo apt install libnuma-dev wget gnupg2`
+    1. Add public key: `wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -`
+    1. Add repo: `echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/debian/ ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list`
+    1. Update cache: `sudo apt update`
+
+#### Steps for NVIDIA Paltforms
+
+1. Install the CUDA toolkit and the NVIDIA driver: See [CUDA](/config/hpc/cuda/).
+1. Install: `sudo apt install hip-nvcc`
+
+#### Steps for AMD Paltforms
+
+1. Install stuff: `sudo apt install mesa-common-dev clang comgr`
+1. Install ROCm: See [ROCm](/config/hpc/rocm/).
+
+#### Common Steps After
+
+1. Fix symlinks and PATH:
+    - (NVIDIA platforms only) CUDA symlink (`/usr/local/cuda`): Should already point to the right thing.
+    - (AMD platforms only) ROCm symlink (`/opt/rocm`): `sudo ln -s /opt/rocm-4.2.0 /opt/rocm` (example)
+    - Add to PATH: `echo 'export PATH=$PATH:/opt/rocm/bin:/opt/rocm/rocprofiler/bin:/opt/rocm/opencl/bin' | sudo tee -a /etc/profile.d/rocm.sh`
+1. Verify installation: `/opt/rocm/bin/hipconfig --full`
+1. (Optional) Try to build the square example program: [square (ROCm HIP samples)](https://github.com/ROCm-Developer-Tools/HIP/tree/master/samples/0_Intro/square)
+
+## Usage and Tools
+
+- Show system info:
+    - Show lots of HIP stuff: `hipconfig --config`
+    - Show platform (`amd` or `nvidia`): `hipconfig --platform`
+- Convert CUDA program to HIP: `hipify-perl input.cu > output.cpp`
+
+{% include footer.md %}

+ 111 - 0
config/hpc/interconnects.md

@@ -0,0 +1,111 @@
+---
+title: Interconnects
+breadcrumbs:
+- title: Configuration
+- title: High-Performance Computing (HPC)
+---
+{% include header.md %}
+
+Using **Debian**, unless otherwise stated.
+
+## Related Pages
+
+- [Linux Switching & Routing](/config/network/linux/)
+
+## General
+
+- The technology should implement RDMA, such that the CPU is not involved in transferring data between hosts (a form of zero-copy). CPU involvement would generally increase latency, increase jitter and limit bandwidth, as well as making the CPU processing power less available to other processing. This implies that the network card must be intelligent/smart and implement hardware offloading for the protocol.
+- The technology should provide a rich communication interface to (userland) applications. The interface should not involve the kernel as unnecessary buffering and context switches would again lead to increased latency, increased jitter, limited bandwidth and excessive CPU usage. Instead of using the TCP/IP stack on top of the interconnect, Infiniband and RoCE (for example) provides "verbs" that the application uses to communicate with applications over the interconnect.
+- The technology should support one-sided communication.
+- OpenFabrics Enterprise Distribution (OFED) is a unified stack for supporting IB, RoCE and iWARP in a unified set of interfaces called the OpenFabrics Interfaces (OFI) to applications. Libfabric is the user-space API.
+- UCX is another unifying stack somewhat similar to OFED. Its UCP API is equivalent to OFED's Libfabric API.
+
+## Ethernet
+
+### Info
+
+- More appropriate for commodify clusters due to Ethernet NICs and switches being available off-the-shelf for lower prices.
+- Support for RoCE (RDMA) is recommended to avoid overhead from the kernel TCP/IP stack (as is typically used wrt. Ethernet).
+- Ethernet-based interconnects include commodity/plain Ethernet, Internet Wide-Area RDMA Protocol (iWARP), Virtual Protocol Interconnect (VPI) (Infiniband and Ethernet on same card), and RDMA over Converged Ethernet (RoCE) (Infiniband running over Ethernet).
+
+## RDMA Over Converged Ethernet (RoCE)
+
+### Info
+
+- Link layer is Converged Ethernet (CE) (aka data center bridging (DCB)), but the upper protocols are Infiniband (IB).
+- v1 uses an IB network layer and is limited to a single broadcast domain, but v2 uses a UDP/IP network layer (and somewhat transport layer) and is routable over IP routers. Both use an IB transport layer.
+- RoCE requires the NICs and switches to support it.
+- It performs very similar to Infiniband, given equal hardware.
+
+## InfiniBand
+
+### Info
+
+- Each physical connection uses a specified number of links/lanes (typically x4), such that the throughput is aggregated.
+- Per-lane throughput:
+    - SDR: 2Gb/s
+    - DDR: 4Gb/s
+    - QDR: 8Gb/s
+    - FDR10: 10Gb/s
+    - FDR: 13.64Gb/s
+    - EDR: 25Gb/s
+    - HDR: 50Gb/s
+    - NDR: 100Gb/s
+    - XDR: 250Gb/s
+- The network adapter is called a host channel adapter (HCA).
+- It's typically switches, but supports routing between subnets as well.
+- Channel endpoints between applications a called queue pairs (QPs).
+- To avoid invoking the kernel when communicating over a channel, the kernel allocates and pins a memory region that the userland application and the HCA can both access without further kernel involvement. A local key is used by the application to access the HCA buffers and an unencrypted remote key is used by the remote host to access the HCA buffers.
+- Communication uses either channel semantics (the send/receive model, two-sided) or memory semantics (RDMA model, one-sided). It also supports a special type of memory semantics using atomic operations, which is a useful foundation for e.g. distributed locks.
+- Each subnet requires a subnet manager to be running on a switch or a host, which manages the subnet and is queryable by hosts (agents). For very large subnets, it may be appropriate to run it on a dedicated host. It assigns addresses to endpoints, manages routing tables and more.
+
+### Installation
+
+1. Install RDMA: `apt install rdma-core`
+1. Install user-space RDMA stuff: `apt install ibverbs-providers rdmacm-utils infiniband-diags ibverbs-utils`
+1. Install subnet manager (SM): `apt install opensm`
+    - Only one instance is required on the network, but multiple may be used for redundancy.
+    - A master SM is selected based on configured priority, with GUID as a tie breaker.
+1. Setup IPoIB:
+    - Just like for Ethernet. Just specify the IB interface as the L2 device.
+    - Use an appropriate MTU like 2044.
+1. Make sure ping and ping-pong is working (see examples below).
+
+### Usage
+
+- Show IPoIB status: `ip a`
+- Show local devices:
+    - GUIDs: `ibv_devices`
+    - Basics (1): `ibstatus`
+    - Basics (2): `ibstat`
+    - Basics (3): `ibv_devinfo`
+- Show link statuses for network: `iblinkinfo`
+- Show subnet nodes:
+    - Hosts: `ibhosts`
+    - Switches: `ibswitches`
+    - Routers: `ibrouters`
+- Show active subnet manager(s): `sminfo`
+- Show subnet topology: `ibnetdiscover`
+- Show port counters: `perfquery`
+
+#### Testing
+
+- Ping:
+    - Server: `ibping -S`
+    - Client: `ibping -G <guid>`
+- Ping-pong:
+    - Server: `ibv_rc_pingpong -d <device> [-n <iters>]`
+    - Client: `ibv_rc_pingpong [-n <iters>] <ip>`
+- Other tools:
+    - qperf
+    - perftest
+- Diagnose with ibutils:
+    - Requires the `ibutils` package.
+    - Diagnose fabric: `ibdiagnet -ls 10 -lw 4x` (example)
+    - Diagnose path between two nodes: `ibdiagpath -l 65,1` (example)
+
+## NVLink & NVSwitch
+
+See [CUDA](/se/general/cuda/).
+
+{% include footer.md %}

+ 55 - 0
config/hpc/rocm.md

@@ -0,0 +1,55 @@
+---
+title: ROCm
+breadcrumbs:
+- title: Configuration
+- title: High-Performance Computing (HPC)
+---
+{% include header.md %}
+
+AMD ROCm (Radeon Open Compute), for programming AMD GPUs. AMD's alternative to NVIDIA's CUDA toolkit.
+It uses the runtime API and kernel language HIP, which is compilable for both AMD and NVIDIA GPUs.
+
+### Related Pages
+{:.no_toc}
+
+- [HIP](/config/hpc/hip/)
+
+## Resources
+
+- [ROCm Documentation (AMD ROCm Docs)](https://rocmdocs.amd.com/)
+- [ROCm Installation (AMD ROCm Docs)](https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html)
+
+## Setup
+
+### Linux Installation
+
+Using **Ubuntu 20.04 LTS**.
+
+#### Notes
+
+- Official installation instructions: [ROCm Installation (AMD ROCm Docs)](https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html)
+- **TODO** `video` and `render` groups required to use it? Using `sudo` as a temporary solution works.
+
+#### Steps
+
+1. If the `amdgpu-pro` driver is installed then uninstall it to avoid conflicts.
+1. If using Mellanox ConnectX NICs then Mellanox OFED must be installed before ROCm.
+1. Add the ROCm package repo:
+    1. Install requirements: `sudo apt install libnuma-dev wget gnupg2`
+    1. Add public key: `wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -`
+    1. Add repo: `echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/debian/ ubuntu main' | sudo tee /etc/apt/sources.list.d/rocm.list`
+    1. Update cache: `apt update`
+1. Install: `sudo apt install rocm-dkms`
+1. Fix symlinks and PATH:
+    - ROCm symlink (`/opt/rocm`): `sudo ln -s /opt/rocm-4.2.0 /opt/rocm` (example) (**TODO** Will this automatically point to the right thing?)
+    - Add to PATH: `echo 'export PATH=$PATH:/opt/rocm/bin:/opt/rocm/rocprofiler/bin:/opt/rocm/opencl/bin' | sudo tee -a /etc/profile.d/rocm.sh`
+1. Reboot.
+1. Verify:
+    - `sudo /opt/rocm/bin/rocminfo` (should show e.g. one agent for the CPU and one for the GPU)
+    - `sudo /opt/rocm/opencl/bin/clinfo`
+
+## Usage and Tools
+
+- Show GPU info: `rocm-smi`
+
+{% include footer.md %}

+ 53 - 0
config/hpc/singularity.md

@@ -0,0 +1,53 @@
+---
+title: Singularity
+breadcrumbs:
+- title: Configuration
+- title: High-Performance Computing (HPC)
+---
+{% include header.md %}
+
+A container technology for HPC.
+
+## Information
+
+- For more general imformation and comparison to other HPC container technologies, see [Containers](/config/hpc/containers/).
+
+## Configuration
+
+**TODO**
+
+## Usage
+
+### Running
+
+- Run command: `singularity exec <opts> <img> <cmd>`
+- Run interactive shell: `singularity shell <opts> <img>`
+- Mounts:
+    - The current directory is mounted and used as the working directory by default.
+- Env vars:
+    - Env vars are copied from the host, but `--cleanenv` may be used to avoid that.
+    - Extra can be specified using `--env <var>=<val>`.
+- GPUs:
+    - See extra notes.
+    - Specify `--nv` (NVIDIA) or `--rocm` (AMD) to expose GPUs.
+
+### Images
+
+- Pull image from repo:
+    - Will place the image as a SIF file (`<image>_<tag>.sif`) in the current directory.
+    - Docker Hub: `singularity pull docker://<img>:<tag>`
+
+### GPUs
+
+- The GPU driver library must be exposed in the container and `LD_LIBRARY_PATH` must be updated.
+- Specify `--nv` (NVIDIA) or `--rocm` (AMD) when running a container.
+
+### MPI
+
+- Using the "bind approach", where MPI and the interconnect is bind mounted into the container.
+- MPI is installed in the container in order to build the application with dynamic linking.
+- MPI is installed on the host such that the application can dynamically load it at run time.
+- The MPI implementations must be of the same family and preferably the same version (for ABI compatibility). While MPICH, IntelMPI, MVAPICH and CrayMPICH use the same ABI, Open MPI does not comply with that ABI.
+- When running the application, both the MPI implementation and the interconnect must be bind mounted into the container and and appropriate `LD_LIBRARY_PATH` must be provided for the MPI libraries. This may be statically configured by the system admin.
+
+{% include footer.md %}

+ 45 - 4
config/linux-server/applications.md

@@ -75,9 +75,31 @@ Sends an emails when APT updates are available.
 
 ## BIND
 
+### Info
+
 - Aka "named".
 
-**TODO**
+### Config
+
+- Should typically be installed directly on the system, but the Docker image is pretty good too.
+    - Docker image: [internetsystemsconsortium/bind9 (Docker Hub)](https://hub.docker.com/internetsystemsconsortium/bind9)
+- Docs and guides:
+    - [The BIND 9 Administrator Reference Manual (ARM)](https://bind9.readthedocs.io/)
+    - [DNSSEC Guide (BIND 9 docs)](https://bind9.readthedocs.io/en/latest/dnssec-guide.html)
+    - [Tutorial: How To Configure Bind as a Caching or Forwarding DNS Server on Ubuntu 16.04 (DigitalOcean)](https://www.digitalocean.com/community/tutorials/how-to-configure-bind-as-a-caching-or-forwarding-dns-server-on-ubuntu-16-04)
+    - [Tutorial: How To Setup DNSSEC on an Authoritative BIND DNS Server (DigitalOcean)](https://www.digitalocean.com/community/tutorials/how-to-setup-dnssec-on-an-authoritative-bind-dns-server-2)
+
+### Usage
+
+- Valdiate config: `named-checkconf`
+- Validate DNSSEC validation:
+    - `dig cloudflare.com @<server>` should give status `NOERROR` and contain the `ad` flag (for "authentic data", i.e. it passed DNSSEC validation).
+    - `dig www.dnssec-failed.org @<server>` should give status `SERVFAIL`.
+    - `dig www.dnssec-failed.org @<server> +cd` (for "checking disabled", useful for DNSSEC debugging) should give status `NOERROR` but no `ad` flag.
+- Validate DNSSEC signing:
+    - Resolve with dig and a validating server.
+    - [Verisign DNSSEC Debugger](https://dnssec-debugger.verisignlabs.com/)
+    - [DNSViz](https://dnsviz.net/)
 
 ## bitwarden_rs
 
@@ -103,6 +125,22 @@ See [Storage: Ceph](/config/linux-server/storage/#ceph).
 - Dry-run renew: `certbot renew --dry-run [--staging]`
 - Revoke certificate: `certbot revoke --cert-path <cert>`
 
+## Chrony
+
+### Setup (Server)
+
+1. Install: `apt install chrony`
+1. Modify config (`/etc/chrony/chrony.conf`):
+    - (Optional) Add individual servers: `server <address> iburst`
+    - (Optional) Add pool of servers (a name resolving to multiple servers): `pool <address> iburst`
+    - (Optional) Allow clients: `allow {all|<network>}`
+1. Restart: `systemctl restart chrony`
+
+### Usage
+
+- Check tracking: `chronyc tracking`
+- Check sources: `chronyc sources`
+
 ## DDNS
 
 ### Cloudflare
@@ -331,7 +369,7 @@ Example `/etc/exports`:
 
 1. Disable systemd-timesyncd NTP client by disabling and stopping `systemd-timesyncd`.
 1. Install `ntp`.
-1. In `/etc/ntp.conf`, replace existing servers/pools with `ntp.justervesenet.no` with the `iburst` option.
+1. Configure servers/pool in `/etc/ntp.conf`, with the `iburst` option.
 1. Test with `ntpq -pn` (it may take a minute to synchronize).
 
 ## NUT
@@ -414,11 +452,14 @@ echo -e "Time: $(date)\nMessage: $@" | mail -s "NUT: $@" root
 
 ### Usage
 
+- Show UPSes: `upsc -l`
+- Show UPS vars: `upsc <ups>`
+
 #### Query the Server
 
 1. Telnet into it: `telnet localhost 3493`
-1. List UPSes: `LIST UPS` (the second field is the UPS ID)
-1. List variables: `LIST VAR <ups>`
+1. Show UPSes: `LIST UPS`
+1. Show UPS vars: `LIST VAR <ups>`
 
 ## OpenSSL
 

+ 61 - 29
config/linux-server/debian.md

@@ -10,35 +10,75 @@ Using **Debian 10 (Buster)**.
 
 ## Basic Setup
 
-If using automation to provision the system, only the "installation" part is necessary.
-If using a hypervisor, the VM may be turned into a template after the "installation" part, so that you only need to do the manual installation once and then clone the template when you need more VMs.
-
 ### Installation
 
 - Always verify the downloaded installation image after downloading it.
-- Use UEFI if possible.
+- If installing in a Proxmox VE VM, see [Proxmox VE: VMs: Initial Setup](/config/virt-cont/proxmox-ve/#initial-setup).
+- Prefer UEFI if possible.
 - Use the non-graphical installer. It's basically the same as the graphical one.
 - Localization:
+    - For automation-managed systems: It doesn't matter.
     - Language: United States English.
     - Location: Your location.
     - Locale: United States UTF-8 (`en_US.UTF-8`).
     - Keymap: Your keyboard's keymap.
-- Use an FQDN as the hostname. It'll set both the shortname and the FQDN.
+- Network settings:
+    - For automation-managed systems: Both DHCP and static IP addresses are fine, do whatever is more practical.
+    - For static servers: Just configure the static IP addresses.
+- Use an FQDN as the hostname.
+    - For automation-managed systems: It doesn't matter, just leave it as `debian` or something.
+    - It'll automatically split it into the shortname and the FQDN.
     - If using automation to manage the system, this doen't matter.
 - Use separate password for root and your personal admin user.
-    - If using automation to manage the system, the passwords may be something temporary and the non-root user may be called e.g. `ansible` and used for automation.
+    - For automation-managed systems: The passwords may be something temporary and the non-root user may be called e.g. `ansible` (for the initial automation).
 - System disk partitioning:
-    - "Simple" system: Guided, single partition, use all available space.
-    - "Complex" system: Manually partition, see [system storage](/config/linux-server/storage/#system-storage).
+    - Simple system: Guided, single partition, use all available space.
+    - Advanced system: Manually partition, see [system storage](/config/linux-server/storage/#system-storage).
     - Swap can be set up later as a file or LVM volume.
     - When using LVM: Create the partition for the volume group, configure LVM (separate menu), configure the LVM volumes (filesystem and mount).
-- At the software selection menu, select only "SSH server" and "standard system utilities".
+- Package manager:
+    - Just pick whatever it suggests.
+- Software selection:
+    - Select only "SSH server" and "standard system utilities".
 - If it asks to install non-free firmware, take note of the packages so they can be installed later.
-- Install GRUB to the used disk (not partition).
+- GRUB bootloader:
+    - Install to the suggested root disk (e.g. `/dev/sda`).
+
+### Prepare for Ansible Configuration
+
+Do this if you're going to use Ansible to manage the system.
+This is mainly to make the system accessible by Ansible, which can then take over the configuration.
+If creating a template VM, run the first instructions before saving the template and then run the last instructions on cloned VMs.
+
+1. Upgrade all packages: `apt update && apt full-upgrade`
+1. If running in a QEMU VM (e.g. in Proxmox), install the agent: `apt install qemu-guest-agent`
+1. Setup sudo for the automation user: `apt install sudo && usermod -aG sudo ansible`
+1. (Optional) Convert the VM into a template and clone it into a new VM to be used hereafter.
+1. Update the IP addresses in `/etc/network/interfaces` (see the example below).
+1. Update the DNS server(s) in `/etc/resolv.conf`: `nameserver 1.1.1.1`
+1. Reboot.
+
+Example `/etc/network/interfaces`:
+
+```
+source /etc/network/interfaces.d/*
 
-### Reconfigure Clones
+auto lo
+ïface lo inet loopback
 
-If you didn't already configure this during the installation, e.g. if cloning a template VMs or something.
+allow-hotplug ens18
+iface ens18 inet static
+    address 10.0.0.100/22
+    gateway 10.0.0.1
+iface ens18 inet6 static
+    address fdaa:aaaa:aaaa:0::100/64
+    gateway fdaa:aaaa:aaaa:0::1
+    accept_ra 0
+```
+
+### Manual Configuration
+
+The first steps may be skipped if already configured during installation (i.e. not cloning a template VM).
 
 1. Check the system status:
     - Check for failed services: `systemctl --failed`
@@ -56,9 +96,6 @@ If you didn't already configure this during the installation, e.g. if cloning a
     - Set both the shortname and FQDN in `/etc/hosts` using the following format: `127.0.0.1 <fqdn> <shortname>`
         - If the server has a static IP address, use that instead of 127.0.0.1.
     - Check the hostnames with `hostname` (shortname) and `hostname --fqdn` (FQDN).
-
-### Basic Configuration
-
 1. Packages:
     - (Optional) Enable the `contrib` and `non-free` repo areas: `add-apt-repository <area>`
         - Or by setting `main contrib non-free` for every `deb`/`deb-src` in `/etc/apt/sources.list`.
@@ -71,10 +108,9 @@ If you didn't already configure this during the installation, e.g. if cloning a
     - Fix YAML formatting globally: In `/etc/vim/vimrc.local`, add `autocmd FileType yaml setlocal ts=2 sts=2 sw=2 expandtab`.
 1. Add mount options:
     - Setup hidepid:
-        - **TODO** Use existing `adm` group instead of creating a new one?
-        - Add PID monitor group: `groupadd -g 500 hidepid` (example GID)
-        - Add your personal user to the PID monitor group: `usermod -aG hidepid <user>`
-        - Enable hidepid in `/etc/fstab`: `proc /proc proc defaults,hidepid=2,gid=500 0 0`
+        - Note: The `adm` group will be granted access.
+        - Add your personal user to the PID monitor group: `usermod -aG adm <user>`
+        - Enable hidepid in `/etc/fstab`: `proc /proc proc defaults,hidepid=2,gid=<adm-gid> 0 0` (using the numerical GID of `adm`)
     - (Optional) Disable the tiny swap partition added by the guided installer by commenting it in the fstab.
     - (Optional) Setup extra mount options: See [Storage](system.md).
     - Run `mount -a` to validate fstab.
@@ -84,7 +120,7 @@ If you didn't already configure this during the installation, e.g. if cloning a
     - Add the relevant groups (using `usermod -aG <group> <user>`):
         - `sudo` for sudo access.
         - `systemd-journal` for system log access.
-        - `hidepid` (whatever it's called) if using hidepid, to see all processes.
+        - `adm` for hidepid, to see all processes (if using hidepid).
     - Add your personal SSH pubkey to `~/.ssh/authorized_keys` and fix the owner and permissions (700 for dir, 600 for file).
         - Hint: Get `https://github.com/<user>.keys` and filter the results.
     - Try logging in remotely and gain root access through sudo.
@@ -182,6 +218,8 @@ Prevent enabled (and potentially untrusted) interfaces from accepting router adv
 
 #### DNS
 
+**TODO** Setup `resolvconf` to prevent automatic `resolv.conf` changes.
+
 ##### Using systemd-resolved (Alternative 1)
 
 1. (Optional) Make sure no other local DNS servers (like dnsmasq) is running.
@@ -191,12 +229,7 @@ Prevent enabled (and potentially untrusted) interfaces from accepting router adv
     - (Optional) `DNSSEC`: Set to `no` to disable (only if you have a good reason to, like avoiding the chicken-and-egg problem with DNSSEC and NTP).
 1. (Optional) If you're hosting a DNS server on this machine, set `DNSStubListener=no` to avoid binding to port 53.
 1. Enable the service: `systemctl enable --now systemd-resolved.service`
-1. Fix `/etc/resolv.conf`:
-    - Note: The systemd-generated one is `/run/systemd/resolve/stub-resolv.conf`.
-    - Note: Simply symlinking `/etc/resolv.conf` to the systemd one will cause dhclient to overwrite it if using DHCP for any interfaces, so don't do that.
-    - Note: This method may cause `/etc/resolv.conf` to become outdated if the systemd one changes for some reason (e.g. if the search domains change).
-    - After configuring and starting resolved, copy (not link) `resolv.conf`: `cp /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf`
-    - Make it immutable so dhclient can't update it: `chattr +i /etc/resolv.conf`
+1. Link `/etc/resolv.conf`: `ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf`
 1. Check status: `resolvectl`
 
 ##### Using resolv.conf (Alternative 2)
@@ -269,12 +302,11 @@ Everything here is optional.
     - Install: `apt install lynis`
     - Run: `lynis audit system`
 - MOTD:
-    - Clear `/etc/motd` and `/etc/issue`.
-    - Download [dmotd.sh](https://github.com/HON95/scripts/blob/master/server/linux/general/dmotd.sh) to `/etc/profile.d/`.
+    - Clear `/etc/motd`, `/etc/issue` and `/etc/issue.net`.
+    - Download [dmotd.sh](https://github.com/HON95/scripts/blob/master/linux/login/dmotd.sh) to `/etc/profile.d/`.
     - Install the dependencies: `neofetch lolcat`
     - Add an ASCII art (or Unicode art) logo to `/etc/logo`, using e.g. [TAAG](http://patorjk.com/software/taag/).
     - (Optional) Add a MOTD to `/etc/motd`.
-    - (Optional) Clear or change the pre-login message in `/etc/issue`.
     - Test it: `su - <some-normal-user>`
 - Setup monitoring:
     - Use Prometheus with node exporter or something and set up alerts.

+ 0 - 69
config/linux-server/networking.md

@@ -1,69 +0,0 @@
----
-title: Linux Server Networking
-breadcrumbs:
-- title: Configuration
-- title: Linux Server
----
-{% include header.md %}
-
-Using **Debian**, unless otherwise stated.
-
-### TODO
-{:.no_toc}
-
-- Migrate stuff from Debian page.
-- Add link to Linux router page. Maybe combine.
-- Add ethtool notes from VyOS.
-
-## Related Pages
-
-- [Linux Switching & Routing](/config/network/linux/)
-
-## InfiniBand
-
-### Installation
-
-1. Install RDMA: `apt install rdma-core`
-1. Install user-space RDMA stuff: `apt install ibverbs-providers rdmacm-utils infiniband-diags ibverbs-utils`
-1. Install subnet manager (SM): `apt install opensm`
-    - Only one instance is required on the network, but multiple may be used for redundancy.
-    - A master SM is selected based on configured priority, with GUID as a tie breaker.
-1. Setup IPoIB:
-    - Just like for Ethernet. Just specify the IB interface as the L2 device.
-    - Use an appropriate MTU like 2044.
-1. Make sure ping and ping-pong is working (see examples below).
-
-### Usage
-
-- Show IPoIB status: `ip a`
-- Show local devices:
-    - GUIDs: `ibv_devices`
-    - Basics (1): `ibstatus`
-    - Basics (2): `ibstat`
-    - Basics (3): `ibv_devinfo`
-- Show link statuses for network: `iblinkinfo`
-- Show subnet nodes:
-    - Hosts: `ibhosts`
-    - Switches: `ibswitches`
-    - Routers: `ibrouters`
-- Show active subnet manager(s): `sminfo`
-- Show subnet topology: `ibnetdiscover`
-- Show port counters: `perfquery`
-
-#### Testing
-
-- Ping:
-    - Server: `ibping -S`
-    - Client: `ibping -G <guid>`
-- Ping-pong:
-    - Server: `ibv_rc_pingpong -d <device> [-n <iters>]`
-    - Client: `ibv_rc_pingpong [-n <iters>] <ip>`
-- Other tools:
-    - qperf
-    - perftest
-- Diagnose with ibutils:
-    - Requires the `ibutils` package.
-    - Diagnose fabric: `ibdiagnet -ls 10 -lw 4x` (example)
-    - Diagnose path between two nodes: `ibdiagpath -l 65,1` (example)
-
-{% include footer.md %}

+ 30 - 10
config/linux-server/storage-zfs.md

@@ -82,17 +82,25 @@ The installation part is highly specific to Debian 10 (Buster). The backports re
 ### Pools
 
 - Recommended pool options:
-    - Set physical block/sector size: `ashift=<9|12>`
+    - Typical example: `-o ashift=<9|12> -O compression=zstd -O xattr=sa -O atime=off -O relatime=on`
+    - Specifying options during creation: For `zpool`/pools, use `-o` for pool options and `-O` for dataset options. For `zfs`/datasets, use `-o` for dataset options.
+    - Set physical block/sector size (pool option): `ashift=<9|12>`
         - Use 9 for 512 (2^9) and 12 for 4096 (2^12). Use 12 if unsure (bigger is safer).
-    - Enable compression: `compression=zstd`
+    - Enable compression (dataset option): `compression=zstd`
         - Use `lz4` for boot drives (`zstd` booting isn't currently supported) or if `zstd` isn't yet available in the version you're using.
-    - Store extended attributes in the inodes: `xattr=sa`
-        - `on` is default and stores them in a hidden file.
-    - Relax access times: `atime=off` and `relatime=on`
+    - Store extended attributes in the inodes (dataset option): `xattr=sa`
+        - The default is `on`, which stores them in a hidden file.
+    - Relax access times (dataset option): `atime=off` and `relatime=on`
     - Don't enable dedup.
 - Create pool:
     - Format: `zpool create [options] <name> <levels-and-drives>`
-    - Basic example: `zpool create -o ashift=<9|12> -O compression=zstd -O xattr=sa <name> [mirror|raidz|raidz2|...] <drives>`
+    - Basic example: `zpool create [-f] [options] <name> [mirror|raidz|raidz2|...] <drives>`
+        - Use `-f` (force) if the disks aren't clean.
+        - See example above for recommended options.
+    - The pool definition is hierarchical, where top-level elements are striped.
+        - RAID 0 (striped): `<drives>`
+        - RAID 1 (mirrored): `mirror <drives>`
+        - RAID 10 (stripe of mirrors): `mirror <drives> mirror <drives>`
     - Create encrypted pool: See encryption section.
     - Use absolute drive paths (`/dev/disk/by-id/` or similar).
 - View pool activity: `zpool iostat [-v] [interval]`
@@ -183,12 +191,20 @@ The installation part is highly specific to Debian 10 (Buster). The backports re
 - Info:
     - ZoL v0.8.0 and newer supports native encryption of pools and datasets. This encrypts all data except some metadata like pool/dataset structure, dataset names and file sizes.
     - Datasets can be scrubbed, resilvered, renamed and deleted without unlocking them first.
-    - Datasets will by default inherit encryption and the encryption key (the "encryption root") from the parent pool/dataset.
+    - Datasets will by default inherit encryption and the encryption key from the parent pool/dataset (or the nearest "encryption root").
     - The encryption suite can't be changed after creation, but the keyformat can.
+    - Snapshots and clones always inherit from the original dataset.
 - Show stuff:
+    - Encryption: `zfs get encryption` (`off` means unencrypted, otherwise it shows the alg.)
     - Encryption root: `zfs get encryptionroot`
-    - Key status: `zfs get keystatus`. `unavailable` means locked and `-` means not encrypted.
+    - Key format: `zfs get keyformat`
+    - Key location: `zfs get keylocation` (only shows for the encryption root and `none` for encrypted children)
+    - Key status: `zfs get keystatus` (`available` means unlocked, `unavailable` means locked and `-` means not encrypted or snapshot)
     - Mount status: `zfs get mountpoint` and `zfs get mounted`.
+- Locking and unlocking:
+    - Manually unlock: `zfs load-key <dataset>`
+    - Manually lock: `zfs unload-key <dataset>`
+    - Automatically unlock and mount everything: `zfs mount -la` (`-l` to load key, `-a` for all)
 - Create a password encrypted pool:
     - Create: `zpool create -O encryption=aes-128-gcm -O keyformat=passphrase ...`
 - Create a raw key encrypted pool:
@@ -204,13 +220,17 @@ The installation part is highly specific to Debian 10 (Buster). The backports re
     1. Note: The new dataset will become its own encryption root instead of inheriting from any parent dataset/pool.
 - Change encryption property:
     - The key must generally already be loaded.
-    - Change `keyformat`, `keylocation` or `pbkdf2iters`: `zfs change-key -o <property>=<value> <dataset>`
-    - Inherit key from parent: `zfs change-key -i <dataset>`
+    - The encryption properties `keyformat`, `keylocation` and `pbkdf2iters` are inherited from the encryptionroot instead, unlike normal properties.
+    - Show encryptionroot: `zfs get encryptionroot`
+    - Change encryption properties: `zfs change-key -o <property>=<value> <dataset>`
+    - Change key location for locked dataset: `zfs set keylocation=file://<file> <dataset>` (**TODO** difference between `zfs set keylocation= ...` and `zfs change-key -o keylocation= ...`?)
+    - Inherit key from parent (join parent encryption root): `zfs change-key -i <dataset>`
 - Send raw encrypted snapshot:
     - Example: `zfs send -Rw <dataset>@<snapshot> | <...> | zfs recv <dataset>`
     - As with normal sends, `-R` is useful for including snapshots and metadata.
     - Sending encrypted datasets requires using raw (`-w`).
     - Encrypted snapshots sent as raw may be sent incrementally.
+    - Make sure to check the encryption root, key format, key location etc. to make sure they're what they should be.
 
 ### Error Handling and Replacement
 

+ 31 - 2
config/network/fs-fsos-switches.md

@@ -9,17 +9,43 @@ breadcrumbs:
 ### Using
 {:.no_toc}
 
-- FS S3700-24T4F
+- FS S5860-20SQ (core switch)
+- FS S3700-24T4F (access switch)
 
-## Info
+## Basics
 
 - Default credentials: Username `admin` and password `admin`.
 - Default mgmt. IP address: `192.168.1.1/24`
 - By default, SSH, Telnet and HTTP servers are accessible using the default mgmt. address and credentials.
+- Serial config: RS-232 w/ RJ45, baud 115200, 8 data bits, no parity bits, 1 stop bit, no flow control.
 - The default VLAN is VLAN1.
 
 ## Initial Setup
 
+### Core Switch
+
+Using an FS S5860-20SQ.
+
+**TODO**
+
+Random notes (**TODO**):
+
+1. (Optional) Split 40G-interface (QSFP+) into 4x 10G (SFP+): `split interface <if>`
+1. Configure RSTP:
+    - Set protocol: `spanning-tree mode rstp` (default MSTP)
+    - Set priority: `spanning-tree priority <priority>` (default 32768, should be a multiple of 4096, use e.g. 32768 for access, 16384 for distro and 8192 for core)
+    - Set hello time: `spanning-tree hello-time <seconds>` (default 2s)
+    - Set maximum age: `spanning-tree max-age <seconds>` (default 20s)
+    - Set forward delay: `spanning-tree forward-time <seconds>` (default 15s)
+    - Enable: `spanning-tree`
+    - **TODO** Enabled on all interfaces and VLANs by default?
+    - **TODO** Portfast for access ports? `spanning-treelink-type ...`
+    - **TODO** Guards.
+
+### Access Switch
+
+Using an FS S3700-24T4F.
+
 1. Connect to the switch using serial.
     - Using RS-232 w/ RJ45, baud 115200, 8 data bits, no parity bits, 1 stop bit, no flow control.
     - Use `Ctrl+H` for backspace.
@@ -125,6 +151,9 @@ breadcrumbs:
 - Interfaces:
     - Show L2 brief: `show int brief`
     - Show L3 brief: `show ip int brief`
+- STP:
+    - Show details: `show spanning-tree`
+    - Show overview and interfaces: `show spanning-tree summary`
 - LACP:
     - Show semi-detailed overview: `show aggregator-group [n] brief`
     - Show member ports: `show aggregator-group [n] summary`

+ 1 - 0
config/network/juniper-junos-general.md

@@ -48,6 +48,7 @@ breadcrumbs:
     - Change context to container statement: `edit <path>`
     - Go up in context: `up` or `top`
     - Show configuration for current level: `show`
+- Perform operation on multiple interfaces or similar: `wildcard range set int ge-0/0/[0-47] unit 0 family ethernet-switching` (example)
 - Commit config changes: `commit [comment <comment>] [confirmed] [and-quit]`
     - `confirmed` automatically rolls back the commit if it is not confirmed within a time limit.
     - `and-quit` will quit configuration mode after a successful commit.

+ 25 - 11
config/network/juniper-junos-switches.md

@@ -23,6 +23,13 @@ breadcrumbs:
 
 - [Juniper EX3300 Fan Mod](/guides/network/juniper-ex3300-fanmod/)
 
+## Basics
+
+- Default credentials: Username `root` without a password (drops you into the shell instead of the CLI).
+- Default mgmt. IP address: Using DHCPv4.
+- Serial config: RS-232 w/ RJ45, baud 115200, 8 data bits, no parity bits, 1 stop bit, no flow control.
+- Native VLAN: 0, aka `default`
+
 ## Initial Setup
 
 1. Connect to the switch using serial:
@@ -30,7 +37,7 @@ breadcrumbs:
 1. Login:
     - Username `root` and no password.
     - Logging in as root will always start the shell. Run `cli` to enter the operational CLI.
-1. (Optional) Disable default virtual chassis ports (VCPs) if not used:
+1. (Optional) Free virtual chassis ports (VCPs) for normal use:
     1. Enter op mode.
     1. Show VCPs: `show virtual-chassis vc-port`
     1. Remove VCPs: `request virtual-chassis vc-port delete pic-slot <pic-slot> port <port-number>`
@@ -112,8 +119,16 @@ breadcrumbs:
     - **TODO**
 1. Enable EEE:
     - **TODO**
-1. Configure RSTP:
-    - RSTP is the default STP variant for Junos.
+1. (Optional) Configure RSTP:
+    - Note: RSTP is the default STP variant for Junos.
+    - Enter config section: `edit protocols rstp`
+    - Set priority: `set bridge-priority <priority>` (default 32768, should be a multiple of 4096, use e.g. 32768 for access, 16384 for distro and 8192 for core)
+    - Set hello time: `set hello-time <seconds>` (default 2s)
+    - Set maximum age: `set max-age <seconds>` (default 20s)
+    - Set forward delay: `set forward-delay <seconds>` (default 15s)
+    - **TODO** Portfast for access ports?
+    - **TODO** Guards.
+    - **TODO** Enabled on all interfaces and VLANs by default?
 1. Configure SNMP:
     - Note: SNMP is extremely slow on the Juniper switches I've tested it on.
     - Enable public RO access: `set snmp community public authorization read-only`
@@ -127,7 +142,13 @@ breadcrumbs:
 ### Interfaces
 
 - Disable interface or unit: `set disable`
-- Perform operation on multiple interfaces: `wildcard range set int ge-0/0/[0-47] unit 0 family ethernet-switching` (example)
+- Show transceiver info:
+    - `show interfaces diagnostics optics [if]`
+    - `show interfaces media [if]` (less info, only works if interface is up)
+
+### STP
+
+- Show interface status: `show spanning-tree interface`
 
 ## Virtual Chassis
 
@@ -181,11 +202,4 @@ breadcrumbs:
 
 Virtual Chassis Fabric (VCF) evolves VC into a spine-and-leaf architecture. While VC focuses on simplified management, VCF focuses on improved data center connectivity. Only certain switches (like the QFX5100) support this feature.
 
-## Miscellanea
-
-- Serial:
-    - RS-232 w/ RJ45 (Cisco-like).
-    - Baud 9600 (default).
-    - 8 data bits, no parity, 1 stop bits, no flow control.
-
 {% include footer.md %}

+ 2 - 1
config/pc/applications.md

@@ -123,7 +123,8 @@ Note: Since Steam requires 32-bit (i386) variants of certain NVIDIA packages, an
 
 ### Miscellanea
 
-- Windows home dir (typical save location): `~/.local/share/Steam/steamapps/compatdata/<some_id>/pfx/drive_c/users/steamuser/`
+- Proton Windows home dir: `~/.local/share/Steam/steamapps/compatdata/<some_id>/pfx/drive_c/users/steamuser/`
+- Proton Windows home dir (Flatpak): `~/.var/app/com.valvesoftware.Steam/.steamlib/steamapps/compatdata/374320/pfx/drive_c/users/steamuser/`
 
 ## tmux
 

+ 62 - 10
config/virt-cont/docker.md

@@ -17,16 +17,32 @@ Using **Debian**.
     - In `/etc/default/grub`, add `cgroup_enable=memory swapaccount=1` to `GRUB_CMDLINE_LINUX`.
     - Run `update-grub` and reboot the system.
 1. (Recommended) Setup IPv6 firewall and NAT:
-    - By default, Docker does not add any IPTables NAT rules or filter rules, which leaves Docker IPv6 networks open (bad) and requires using a routed prefix (sometimes inpractical). While using using globally routable IPv6 is the gold standard, Docker does not provide firewalling for that when not using NAT as well.
+    - (Info) By default, Docker does not enable IPv6 for containers and does not add any IP(6)Tables rules for the NAT or filter tables, which you need to take into consideration if you plan to use IPv6 (with or without automatic IPTables rules). See the miscellaneous not below on IPv6 support for more info about its brokenness and the implications of that. Docker _does_ however recently support handling IPv6 subnets similar to IPv4, meaning using NAT masquerading and appropriate firewalling. It doesn't work properly for internal networks, though, as it breaks IPv6 ND. The following steps describe how to set that up, as it is the only working solution IMO. MACVLANs with external routers will not be NAT-ed.
     - Open `/etc/docker/daemon.json`.
     - Set `"ipv6": true` to enable IPv6 support at all.
-    - Set `"fixed-cidr-v6": "<prefix/64>"` to some [generated](https://simpledns.plus/private-ipv6) (ULA) or publicly routable (GUA) /64 prefix, to be used by the default bridge.
-    - Set `"ip6tables": true` to enable adding filter and NAT rules to IP6Tables (required for both security and NAT). This only affects non-internal bridges and not e.g. MACVLANs with external routers.
-1. (Optional) Change IPv4 network pool:
-    - - In `/etc/docker/daemon.json`, set `"default-address-pools": [{"base": "10.0.0.0/16", "size": "24"}]`.
-1. (Optional) Change default DNS servers for containers:
+    - Set `"fixed-cidr-v6": "<prefix/64>"` to some [random](https://simpledns.plus/private-ipv6) (ULA) (if using NAT masq.) or routable (GUA or ULA) (if not using NAT masq.) /64 prefix, to be used by the default bridge.
+    - Set `"ip6tables": true` to enable automatic filter and NAT rules through IP6Tables (required for both security and NAT).
+1. (Recommended) Change the cgroup manager to systemd:
+    - In `/etc/docker/daemon.json`, set `"exec-opts": ["native.cgroupdriver=systemd"]`.
+    - It defaults to Docker's own cgroup manager/driver called cgroupfs.
+    - systemd (as the init system for most modern Linux systems) also functions as a cgroup manager, and using multiple cgroup managers may cause the system to become unstable under resource pressure.
+    - If the system already has existing containers, they should be completely recreated after changing the cgroup manager.
+1. (Optional) Change the storage driver:
+    - By default it uses the `overlay2` driver, which is recommended for most setups. (`aufs` was the default before that.)
+    - The only other alternatives worth consideration are `btrfs` and `zfs`, if the system is configured for those file systems.
+1. (Recommended) Change IPv4 network pool:
+    - In `/etc/docker/daemon.json`, set `"default-address-pools": [{"base": "172.17.0.0/12", "size": 24}]`.
+    - For local networks (not Swarm overlays), it defaults to pool `172.17.0.0/12` with `/16` allocations, resulting in a maximum of `2^(16-12)=16` allocations.
+1. (Recommended) Change default DNS servers for containers:
     - In `/etc/docker/daemon.json`, set `"dns": ["1.1.1.1", "2606:4700:4700::1111"]` (example using Cloudflare) (3 servers max).
     - It defaults to `8.8.8.8` and `8.8.4.4` (Google).
+1. (Optional) Change the logging options (JSON file driver):
+    - It defaults to the JSON file driver with a single file of unlimited size.
+    - Configured globally in `/etc/docker/daemon.json`.
+    - Set the driver (explicitly): `"log-driver": "json-file"`
+    - Set the max file size: `"log-opts": { "max-size": "10m" }`
+    - Set the max number of files (for log rotation): `"log-opts": { "max-file": "5" }`
+    - Set the compression for rotated files: `"log-opts": { "compress": "enabled" }`
 1. (Optional) Enable Prometheus metrics endpoint:
     - This only exports internal Docker metrics, not anything about the containers (use cAdvisor for that).
     - In `/etc/docker/daemon.json`, set `"experimental": true` and `"metrics-addr": "[::]:9323"`.
@@ -88,7 +104,7 @@ Using **Debian**.
 
 #### Fix Docker Compose No-Exec Tmp-Dir
 
-Docker Compose will fail to work if `/tmp` has `noexec`.
+Docker Compose will fail to work if `/tmp` is mounted with `noexec`.
 
 1. Move `/usr/local/bin/docker-compose` to `/usr/local/bin/docker-compose-normal`.
 1. Create `/usr/local/bin/docker-compose` with the contents below and make it executable.
@@ -111,14 +127,50 @@ The toolkit is used for running CUDA applications within containers.
 
 See the [installation guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker).
 
+## Best Practices
+
+- Building:
+    - Use simple base images without stuff you don't need (especially for the final image if using multi-stage builds). `alpine` is nice, but uses musl libc instead of glibc, which may cause problems for certain apps.
+    - Use official base images you can trust.
+    - Completely build inside the container to avoid relying on external tools and libraries (for better reproducability and portability).
+    - Use multi-stage builds to separate the heavier build environment/image containing all the build tools and many layers from the final image with the build app copied into it from the previous stage.
+    - To exploit cacheability when building an image multiple times (e.g. during development), put everything that doesn't change (e.g. installing packages) at the top of the Dockerfile and stuff that changes frequently (e.g. copying source files and compilation) as close to the bottom as possible.
+    - Use `COPY` instead of `ADD`, unless you actually need some of the fancy and sometimes unexpected features of `ADD`.
+    - Use `ARG`s and `ENV`s (with defaults) for vars you may want to change before building.
+    - `EXPOSE` is pointless and purely informational.
+    - Use `ENTRYPOINT` (in array form) to specify the entrypoint script or application and `CMD` (in array form) to specify default additional arguments to the `ENDTYPOINT`.
+    - Create a `.dockerignore` file, similar to `.gitignore` files, to avoid copying useless or sensitive files into the container.
+- Signal handling:
+    - Make sure your application is handling signals correctly (e.g. such that it stops properly). The initial process in the container runs with PID 1, which is typically reserved for the init process and is handled specially by certain things.
+    - If your application does not handle signals properly internally, build the image with [tini](https://github.com/krallin/tini) as the entrypoint or run the container with `--init` to make Docker inject tini as the entrypoint.
+- Don't run as root:
+    - Either set a static user in the Dockerfile, change to a specific user (static or dynamic) in the entrypoint script or app itself, or specify a user through Docker run (or equivalent). The latter approach (specified in Docker run) is assumed hereafter.
+    - The app may still be build by root and may be owned by root since the user running it generally shouldn't need to modify the app itself.
+    - If the app needs to modify files, put them in `/tmp`. Maybe make it easy to override the paths for more flexibility wrt. volumes and bind mounts.
+- Credentials and sensitive files:
+    - Don't hard code them anywhere.
+    - Don't ever put them on the image file system during building as it may get caught by one of the image layers.
+    - Specify them as mounted files (with proper permissions), env vars (slightly controversial), Docker secrets or similar.
+- Implement health checks.
+- Docker Compose:
+    - Drop the `version` property (it's deprecated).
+    - Use YAML aliases and anchors to avoid repeating yourself too much. To create an anchor, add `&<anchor>` behind a property (e.g. a service definition). To copy all content from below the property the anchor references, specify `<<: *<anchor>` inside the new property (i.e. one layer lower than the anchor on the other property). Copied properties can be overridden by explicitly specifying them.
+    - Consider implementing health checks within the DC file if the image does not already implement them (Google it).
+    - Consider putting envvars in a separate env file (specified using `--env-file` on the CLI or `env_file: []` in the DC file).
+
 ## Miscellanea
 
 ### IPv6 Support
 
-- TL;DR: Docker doesn't prioritize implementing IPv6 properly.
-- While IPv4 uses IPTables filter rules for firewalling and IPTables NAT rules for masquerading and port forwarding, it generally uses no such mechanisms when enabling IPv6 (using `"ipv6": true`). Setting `"ip6tables": true` (disabled by default) is required to mimic the IPv4 behavior of filtering and NAT-ing. To disable NAT masquerading for both IPv4 and IPv6, set `enable_ip_masquerade=false` on individual networks. Disabling NAT masquerading for only IPv6 is not yet possible. (See [moby/moby #13481](https://github.com/moby/moby/issues/13481), [moby/moby #21951](https://github.com/moby/moby/issues/21951), [moby/moby #25407](https://github.com/moby/moby/issues/25407), [moby/libnetwork #2557](https://github.com/moby/libnetwork/issues/2557).)
+- TL;DR: Docker doesn't properly support IPv6.
+- While IPv6 base support may be enabled by setting `"ipv6": true` in the daemon config (disabled by default), it does not add any IP(6)Tables rules for the filter and NAT tables, as it does for IPv4/IPTables. (See [moby/moby #13481](https://github.com/moby/moby/issues/13481), [moby/moby #21951](https://github.com/moby/moby/issues/21951), [moby/moby #25407](https://github.com/moby/moby/issues/25407), [moby/libnetwork #2557](https://github.com/moby/libnetwork/issues/2557).)
+- Using `"ipv6": true` without `"ip6tables": true` means the following for IPv6 subnets on Docker bridge networks (and probably other network types):
+    - The IPv6 subnet must use a routable prefix which is actually routed to the Docker host (unlike IPv4 which uses NAT masquerading by default). While this is more appropriate for typical infrastructures, this may be quite impractical for e.g. typical home networks.
+    - If you accept forwarded traffic by default (in e.g. IPTables): The IPv6 subnet is not firewalled in any way, leaving it completely open to other networks "on" or "connected to" the Docker host, meaning you need to manually add IPTables rules to limit access to each Docker network.
+    - If you drop/reject forwarded traffic by default (in e.g. IPTables): The IPv6 subnet is completely closed and hosts on the Docker network can't even communicate between themselves (assuming your system filters bridge traffic). To allow intra-network traffic, you need to manually add something like `ip6tables -A FORWARD -i docker0 -o docker0 -j ACCEPT` for each Docker network. To allow for inter-network traffic, you need to manually add rules for that as well.
+- To enable IPv4-like IPTables support (with NAT-ing and firewalling), set `"ip6tables": true` in the daemon config (disabled by default) in the daemon config. If you want to disable NAT masquerading for both IPv4 and IPv6 (while still using the filtering rules provided by `"ip6tables": true`), set `enable_ip_masquerade=false` on individual networks. Disabling NAT masquerading for only IPv6 is not yet possible. MACVLANs with external routers will not get automatically NAT-ed.
 - IPv6-only networks (without IPv4) are not supported. (See [moby/moby #32675](https://github.com/moby/moby/issues/32675), [moby/libnetwork #826](https://github.com/moby/libnetwork/pull/826).)
-- IPv6 communication between containers (ICC) on IPv6-enabled bridges with IP6Tables enabled is broken, due to NDP (using multicast) being blocked by IP6Tables. On non-internal bridges it works fine. One workaround is to not use IPv6 on internal bridges or to not use internal bridges. (See [libnetwork/issues #2626](https://github.com/moby/libnetwork/issues/2626).)
+- IPv6 communication between containers (ICC) on IPv6-enabled _internal_ bridges with IP6Tables enabled is broken, due to IPv6 ND being blocked by the applied IP6Tables rules. On non-internal bridges it works fine. One workaround is to not use IPv6 on internal bridges or to not use internal bridges. (See [libnetwork/issues #2626](https://github.com/moby/libnetwork/issues/2626).)
 - The userland proxy (enabled by default, can be disabled) accepts both IPv4 and IPv6 incoming traffic but uses only IPv4 toward containers, which replaces the IPv6 source address with an internal IPv4 address (I'm not sure which), effectively hiding the real address and may bypass certain defences as it's apparently coming from within the local network. It also has other non-IPv6-related problems. (See [moby/moby #11185](https://github.com/moby/moby/issues/11185), [moby/moby #14856](https://github.com/moby/moby/issues/14856), [moby/moby #17666](https://github.com/moby/moby/issues/17666).)
 
 ## Useful Software

+ 64 - 0
config/virt-cont/k8s.md

@@ -0,0 +1,64 @@
+---
+title: Kubernetes
+breadcrumbs:
+- title: Configuration
+- title: Virtualization & Containerization
+---
+{% include header.md %}
+
+Using **Debian**.
+
+## Setup
+
+1. **TODO**
+1. (Optional) Setup command completion:
+    - BASH (per-user): `echo 'source <(kubectl completion bash)' >>~/.bashrc`
+    - ZSH (per-user): `echo 'source <(kubectl completion zsh)' >>~/.zshrc`
+    - More info:
+        - [bash auto-completion (k8s docs)](https://kubernetes.io/docs/tasks/tools/included/optional-kubectl-configs-bash-linux/)
+        - [zsh auto-completion (k8s docs)](https://kubernetes.io/docs/tasks/tools/included/optional-kubectl-configs-zsh/)
+
+## Usage
+
+- Config:
+    - Show: `kubectl config view`
+- Cluster:
+    - Show: `kubectl cluster-info`
+- Nodes:
+    - Show `kubectl get nodes`
+- Services:
+    - Show: `kubectl get services`
+- Pods:
+    - Show: `kubectl get pods [-A] [-o wide]`
+        - `-A` for all namespaces instead of just the current/default one.
+        - `-o wide` for more info.
+    - Show logs: `kubectl logs <pod> [container]`
+- Manifests:
+    - Show cluster state diff if a manifest were to be applied: `kubectl diff -f <manifest-file>`
+- Events:
+    - Show: `kubectl get events`
+
+## Minikube
+
+Minikube is local Kubernetes, focusing on making it easy to learn and develop for Kubernetes.
+
+### Setup
+
+1. See: [minikube start (minikube docs)](https://minikube.sigs.k8s.io/docs/start/)
+1. Add `kubectl` symlink: `sudo ln -s $(which minikube) /usr/local/bin/kubectl`
+1. Add command completion: See normal k8s setup instructions.
+
+### Usage
+
+- Generally all of the normal k8s stuff applies.
+- Generally sudo isn't required.
+- Manage minikube cluster:
+    - Start: `minikube start`
+    - Pause (**TODO** what?): `minikube pause`
+    - Stop: `minikube stop`
+    - Delete (all clusters): `minikube delete --all`
+- Set memory limit (requires restart): `minikube config set memory <megabytes>`
+- Start and open web dashboard: `minikube dashboard`
+- Show addons: `minikube addons list`
+
+{% include footer.md %}

+ 1 - 1
config/virt-cont/libvirt-kvm.md

@@ -21,7 +21,7 @@ Using **Debian**.
 
 1. Install without extra stuff (like GUIs): `apt-get install --no-install-recommends iptables bridge-utils qemu-system qemu-utils libvirt-clients libvirt-daemon-system virtinst libosinfo-bin`
 1. (Optional) Install `dnsmasq-base` for accessing guests using their hostnames.
-1. (Optional) Add users to the `libvirt` group to allow them to manage libvirt without sudo.
+1. (Optional) Add users to the `libvirt` group to allow them to manage libvirt without sudo. Otherwise, remember to always specify use sudo to use the correct context/system URI/whatever.
 1. Set up the default network:
     1. It's already created, using NAT, DNS and DHCP.
     1. If not using dnsmasq, disable DNS and DHCP:

+ 13 - 3
config/virt-cont/proxmox-ve.md

@@ -383,18 +383,20 @@ Check the host system logs. It may for instance be due to hardware changes or st
 - UDP 111: rpcbind (optional).
 - UDP 5404-5405: Corosync (internal).
 
-## Ceph
+## Storage
+
+### Ceph
 
 See [Storage: Ceph](/config/linux-server/storage/#ceph) for general notes.
 The notes below are PVE-specific.
 
-### Notes
+#### Notes
 
 - It's recommended to use a high-bandwidth SAN/management network within the cluster for Ceph traffic.
   It may be the same as used for out-of-band PVE cluster management traffic.
 - When used with PVE, the configuration is stored in the cluster-synchronized PVE config dir.
 
-### Setup
+#### Setup
 
 1. Setup a shared network.
     - It should be high-bandwidth and isolated.
@@ -414,4 +416,12 @@ The notes below are PVE-specific.
     - Use at least size 3 and min. size 2 in production.
     - "Add storage" adds the pool to PVE for disk image and container content.
 
+### Troubleshooting
+
+**"Cannot remove image, a guest with VMID '100' exists!" when trying to remove unused VM disk**:
+
+- Make sure it's not mounted to the VM.
+- Make sure it's not listed as an "unused disk" for the VM.
+- Run `qm rescan --vmid <vmid>` and check the steps above.
+
 {% include footer.md %}

+ 6 - 1
index.md

@@ -40,8 +40,13 @@ Random collection of config notes and miscellaneous stuff. _Technically not a wi
 ### HPC
 
 - [Slurm Workload Manager](/config/hpc/slurm/)
+- [Containers](/config/hpc/containers/)
+- [Singularity](/config/hpc/singularity/)
+- [HIP](/config/hpc/hip/)
+- [ROCm](/config/hpc/rocm/)
 - [CUDA](/config/hpc/cuda/)
 - [Open MPI](/config/hpc/openmpi/)
+- [Interconnects](/config/hpc/interconnects/)
 
 ### IoT & Home Automation
 
@@ -55,7 +60,6 @@ Random collection of config notes and miscellaneous stuff. _Technically not a wi
 - [Storage](/config/linux-server/storage/)
 - [Storage: ZFS](/config/linux-server/storage-zfs/)
 - [Storage: Ceph](/config/linux-server/storage-ceph/)
-- [Networking](/config/linux-server/networking/)
 
 ### Media
 
@@ -104,6 +108,7 @@ Random collection of config notes and miscellaneous stuff. _Technically not a wi
 ### Virtualization & Containerization
 
 - [Docker](/config/virt-cont/docker/)
+- [Kubernetes](/config/virt-cont/k8s/)
 - [libvirt & KVM](/config/virt-cont/libvirt-kvm/)
 - [Proxmox VE](/config/virt-cont/proxmox-ve/)
 

+ 4 - 0
it/services/dns.md

@@ -6,6 +6,10 @@ breadcrumbs:
 ---
 {% include header.md %}
 
+## Resources
+
+- [[RFC 1912] Common DNS Operational and Configuration Errors](https://datatracker.ietf.org/doc/html/rfc1912)
+
 ## Basics
 
 Everyone knows this, no point reiterating.

+ 97 - 25
se/hpc/cuda.md

@@ -6,16 +6,57 @@ breadcrumbs:
 ---
 {% include header.md %}
 
+Introduced by NVIDIA in 2006. While GPU compute was hackishly possible before CUDA through the fixed graphics pipeline, CUDA and CUDA-capable GPUs provided more somewhat more generalized GPU architecture and a programming model for GPU compute.
+
 ### Related Pages
 {:.no_toc}
 
 - [CUDA (configuration)](/config/config/hpc/cuda.md)
 
-## General
+## Hardware Architecture
+
+- Modern CUDA-capable GPUs contain multiple types of cores:
+    - CUDA cores: Aka programmable shading cores.
+    - Tensor cores: Mostly for AI. **TODO** and _RT cores_ (for ray tracing). Tensor cores may be accessed in CUDA through special CUDA calls, but RT cores are (as of writing) only accessible from Optix.
+- **TODO** SMs, GPCs, TPCs, warp schedulers, etc.
+
+**TODO** Move "Mapping the Programming Model to the Execution Model" and "Thread Hierarchy" here?
+
+### SMs and Blocks
+
+- During kernel execution, each block gets assigned to a single SM. Multiple blocks may be assigned to the same SM.
+- The maximum number of active blocks per SM is limited by one of the following:
+    - Blocks and warps per SM: Both numbers are defined by the SM. For maximum theoretical occupancy, the blocks must together contain enough threads to fill the maximum number of warps per SM, without exceeding the maximum number of blocks per SM.
+    - Registers per SM: The set of registers in an SM are shared by all active threads (from all active blocks on the SM), meaning that the amount of registers used by a threads may limit the occupancy. Since less registers per thread may considerably degrade performance, this is the main reason why too high occupancy is generally bad. The register count per thread is by default determined heuristically by the compiler to minimize register spilling to local memory, but `__launch_bounds__` may be used to assist the compiler with the allocation.
+    - Shared memory per SM: Shared memory is shared between all threads of a single block (running on a single SM). Allocating a large amount of shared memory per block will limit the number of active blocks on the SM and therefore may implicitly limit occupancy if the blocks have few threads each. Allocating more memory away from shared memory and towards the L1 cache may also contribute to reduced occupancy.
+- Blocks are considered _active_ from the time its warps have started executing until all warps have finished executing.
+
+### Warp Schedulers and Warps
+
+- Each SM consists of one or more warp schedulers.
+- Warps consist of up to 32 threads from a block (for all current compute capabilities), i.e. the threads of a block are grouped into warps in order to get executed.
+- A warp is considered active from the point its threads start executing until all threads have finished. SMs have a limit on the number of active warps, meaning the remaining inactive warps will need to wait until the current active ones are finished execuring. The ratio of active warps on an SM to the maximum number of active warps on the SM is called _occupancy_.
+- Each warp scheduler has multiple warp slots which may be active (containing an _active_ warp) or _unused_.
+- At most one warp is _selected_ (see states below) per clock per warp scheduler, which then executes a single instruction.
+- Active warp states:
+    - Stalled: It's waiting for instructions or data to load, or some other dependency.
+    - Eligible: It's ready to get scheduled for execution.
+    - Selected: It's eligible and has been selected for execution during the current clock.
+- **TODO** scheduling policy?
+
+## Programming
+
+#### TODO
 
-- Introduced by NVIDIA in 2006. While GPU compute was possible before through hackish methods, CUDA provided a programming model for compute which included e.g. thread blocks, shared memory and synchronization barriers.
-- Modern NVIDIA GPUs contain _CUDA cores_, _tensor cores_ and _RT cores_ (ray tracing cores). Tensor cores may be accessed in CUDA through special CUDA calls, but RT cores are (as of writing) only accessible from Optix and not CUDA.
 - The _compute capability_ describes the generation and supported features of a GPU. **TODO** More info about `-code`, `-arch` etc.
+- SM processing blocks/partitions same as warp schedulers?
+- SM processing block datapaths.
+
+### General
+
+- Branch divergence: Each SM has only a single control unit for all cores within it, so for all branches any thread takes (in total), the SM and all of its cores will need to go through all of the branches but mask the output for all threads which did not locally take the branch. If no threads take a specific branch, it will not be executed by the SM.
+- Host code and device code: Specifying the `__host__` keyword for a function means that it will be accessible by the host (the default if nothing is specified). Specifying the `__device__` keyword for a function means that it will be accessible by devices. Specifying both means it will be accessible by both.
+- Kernels are specified as functions with the `__global__` keyword.
 
 ### Mapping the Programming Model to the Execution Model
 
@@ -26,15 +67,9 @@ breadcrumbs:
 - Each CUDA core within an SM executes a thread from a block assigned to the SM.
 - **TODO** Warps and switches. 32 threads per warp for all current GPUs.
 
-## Programming
-
-### General
+##### Thread Hierarchy
 
-- Branch divergence: Each SM has only a single control unit for all cores within it, so for all branches any thread takes (in total), the SM and all of its cores will need to go through all of the branches but mask the output for all threads which did not locally take the branch. If no threads take a specific branch, it will not be executed by the SM.
-- Host code and device code: Specifying the `__host__` keyword for a function means that it will be accessible by the host (the default if nothing is specified). Specifying the `__device__` keyword for a function means that it will be accessible by devices. Specifying both means it will be accessible by both.
-- Kernels are specified as functions with the `__global__` keyword.
-
-### Thread Hierarchy
+**TODO** Move into section below.
 
 - Grids consist of a number of blocks and blocks concist of a number of threads.
 - Threads and blocks are indexed in 1D, 2D or 3D space (separately), which threads may access through the 3-compoent vectors `blockDim`, `blockIdx` and `threadIdx`.
@@ -63,6 +98,8 @@ breadcrumbs:
     - The constant and texture memories are cached.
     - The global and local memories are cached in L1 and L2 on newer devices.
     - The register and shared memories are on-chip and fast, so they don't need to be cached.
+- Resource contention:
+    - The pool of registers and shared memory are shared by all active threads in an SM.
 
 #### Register Memory
 
@@ -78,26 +115,30 @@ breadcrumbs:
 
 #### Shared Memory
 
+- Shared between all threads of a block (block-local).
+- The scope is the lifetime of the block.
 - Resides in fast, high-bandwidth on-chip memory.
 - Organized into banks which can be accessed concurrently. Each bank is accessed serially and multiple concurrent accesses to the same bank will result in a bank conflict.
 - Declared using the `__shared__` variable qualifier. The size may be specified during kernel invocation.
-- The scope is the lifetime of the block.
-- **TODO** Shared between?
+- On modern devices, shared memory and the L1 cache resides on the same chip and the amount of memory allocated to each may be specified in the program.
+- **TODO** Static (`__shared__`) and dynamic (specified during kernel invocation).
 
 #### Global Memory
 
 - The largest and slowest memory on the device.
 - Resides in the GPU DRAM.
-- Variables may persist for the lifetime of the application.
-- One of the memories the host can access (outside of kernels).
-- The only memory threads from different blocks can share data in.
+- Per-grid, accessible outside of kernels.
+- Accessible by the host.
+- The only memory threads from different blocks can share stored data in.
 - Statically declared in global scope using the `__device__` declaration or dynamically allocated using `cudaMalloc`.
 - Global memory coalescing: See the section about data alignment.
 
 #### Constant Memory
 
-- Read-only memory. **TODO** And?
+- Read-only memory.
 - Resides in the special constant memory.
+- Per-grid, accessible outside of kernels.
+- Accessible by the host.
 - Declared using the `__constant__` variable qualifier.
 - Multiple/all threads in a warps can access the same memory address simultaneously, but accesses to different addresses are serialized.
 
@@ -203,6 +244,22 @@ breadcrumbs:
 - For getting device attributes/properties, `cudaDeviceGetAttribute` is significantly faster than `cudaGetDeviceProperties`.
 - Use `cudaDeviceReset` to reset all state for the device by destroying the CUDA context.
 
+## Metrics
+
+- Occupancy: The ratio of active warps on an SM to the maximum number of active warps on the SM. Low occupancy generally leads to poor instruction issue efficiency since there may not be enough eligible warps per clock to saturate the warp schedulers. Too high occupancy may also degrade performance as resources may be contend by threads. The occupancy should be high enough to hide memory latencies without causing considerable resource contention, which depends on both the device and application.
+- Theoretical occupancy: Maximum possible occupancy, limited by factors such as warps per SM, blocks per SM, registers per SM and shared memory per SM. This is computed statically without running the kernel.
+- Achieved occupancy (i.e. actual occupancy): Average occupancy of an SM for the whole duration it's active. Measured as the sum of active warps all warp schedulers for an SM for each clock cycle the SM is active, divided by number of clock cycles and then again divided by the maximum active warps for the SM. In addition to the reasons mentioned for theoretical occupancy, it may be limited due to unbalanced workload within blocks, unbalanced workload across blocks, too few blocks launched, and partial last wave (meaning that the last "wave" of blocks aren't enough to activate all warp schedulers of all SMs).
+
+## NVLink & NVSwitch
+
+- Interconnect for connecting NVIDIA GPUs and NICs/HCAs as a mesh within a node, because PCIe was too limited.
+- NVLink alone is limited to only eight GPUs, but NVSwitches allows connecting more.
+- A bidirectional "link" consists of two unidirectional "sub-links", which each contain eight differential pairs (i.e. lanes). Each device may support multiple links.
+- NVLink transfer rate per differential pair:
+    - NVLink 1.0 (Pascal): 20Gb/s
+    - NVLink 2.0 (Volta): 25Gb/s
+    - NVLink 3.0 (Ampere): 50Gb/s
+
 ## Tools
 
 ### CUDA-GDB
@@ -230,20 +287,35 @@ breadcrumbs:
 - For debugging and profiling applications.
 - Requires a Turing/Volta or newer GPU.
 - Comes as multiple variants:
-    - Nsight Systems: For general profiling.
-    - Nsight Compute: For compute-specific profiling (CUDA).
+    - Nsight Systems: For general profiling. Provides profiling information along a single timeline. Has less overhead, making it more appropriate for long-running instances with large datasets. May provide clues as to what to look into with Nsight Compute or Graphics.
+    - Nsight Compute: For compute-specific profiling (CUDA). Isolates and profiles individual kernels (**TODO** for a single or all invocations?).
     - Nsight Graphics: For graphics-specific profiling (OpenGL etc.).
     - IDE integrations.
-- Replaces nvprof.
+- The tools may be run either interactively/graphically through the GUIs, or through the command line versions to generate a report which can be loaded into the GUIs.
+
+### Nsight Compute
 
-#### Installation
+#### Info
 
-1. Download the run-files from the website for each variant (System, Compute, Graphics) you want.
-1. Run the run-files with sudo.
+- Requires Turing/Volta or later.
+- Replaces the much simpler nvprof tool.
+- Supports stepping through CUDA calls.
 
-### Nsight Compute
+#### Installation (Ubuntu)
+
+- Nsight Systems and Compute comes with CUDA if installed through NVIDIA's repos.
+- If it complains about something Qt, install `libqt5xdg3`.
+- Access to performance counters:
+    - Since access to GPU performance counters are limited to protect against side channel attacks (see [Security Notice: NVIDIA Response to “Rendered Insecure: GPU Side Channel Attacks are Practical” - November 2018 (NVIDIA)](https://nvidia.custhelp.com/app/answers/detail/a_id/4738)), it must be run either with sudo (or a user with `CAP_SYS_ADMIN`), or by setting a module option which disables the protection. For non-sensitive applications (e.g. for teaching), this protection is not required. See [NVIDIA Development Tools Solutions - ERR_NVGPUCTRPERM: Permission issue with Performance Counters (NVIDIA)](https://developer.nvidia.com/nvidia-development-tools-solutions-err_nvgpuctrperm-permission-issue-performance-counters) for more info.
+    - Enable access for all users: Add `options nvidia "NVreg_RestrictProfilingToAdminUsers=0"` to e.g. `/etc/modprobe.d/nvidia.conf` and reboot.
+
+#### Usage
 
-- May be run from command line (`ncu`) or using the graphical application.
+- May be run from command line (`ncu`) or using the graphical application (`ncu-ui`).
+- Running it may require sudo, `CAP_SYS_ADMIN` or disabling performance counter protection for the driver module. See the installation note above. If interactive Nsight ends without results or non-interactive or CLI Nsight shows some `ERR_NVGPUCTRPERM` error, this is typically the cause.
+- May be run either in (non-interactive) profile mode or in interactive profile mode (with stepping for CUDA API calls).
+- For each mode, the "sections" (profiling types) to run must be specified. More sections means it takes longer to profile as it may require running the kernel invocations multiple times (aka kernel replaying).
 - Kernel replays: In order to run all profiling methods for a kernel execution, Nsight might have to run the kernel multiple times by storing the state before the first kernel execution and restoring it for every replay. It does not restore any host state, so in case of host-device communication during the execution, this is likely to put the application in an inconsistent state and cause it to crash or give incorrect results. To rerun the whole application (aka "application mode") instead of transparently replaying individual kernels (aka "kernel mode"), specify `--replay-mode=application` (or the equivalent option in the GUI).
+- Supports NVTX (NVIDIA Tools Extension) for instrumenting the application in order to provide context/information around events and certain code.
 
 {% include footer.md %}