소스 검색

Merge branch 'master' of github.com:HON95/wiki

Håvard Ose Nordstrand 4 년 전
부모
커밋
c70e4ad37f
8개의 변경된 파일320개의 추가작업 그리고 90개의 파일을 삭제
  1. 21 4
      config/hpc/cuda.md
  2. 10 5
      config/linux-server/debian.md
  3. 6 0
      config/linux-server/storage-zfs.md
  4. 8 7
      config/pc/applications.md
  5. 6 4
      config/pc/kubuntu.md
  6. 74 70
      index.md
  7. 87 0
      miscellanea/betzy.md
  8. 108 0
      se/cuda-tmp.md

+ 21 - 4
config/hpc/cuda.md

@@ -20,7 +20,7 @@ NVIDIA CUDA (Compute Unified Device Architecture) Toolkit, for programming CUDA-
 - [CUDA Toolkit Download (NVIDIA)](https://developer.nvidia.com/cuda-downloads)
 - [CUDA GPUs (NVIDIA)](https://developer.nvidia.com/cuda-gpus)
 
-## Installation
+## Setup
 
 ### Linux
 
@@ -34,13 +34,30 @@ If an NVIDIA driver is already installed, it must match the CUDA version.
 
 Downloads: [CUDA Toolkit Download (NVIDIA)](https://developer.nvidia.com/cuda-downloads)
 
-## Usage
+#### Ubuntu (NVIDIA CUDA Repo)
 
-### Programming
+1. Follow the steps to add the NVIDIA CUDA repo: [CUDA Toolkit Download (NVIDIA)](https://developer.nvidia.com/cuda-downloads)
+    - But don't install `cuda` yet.
+1. Remove anything NVIDIA or CUDA from the system to avoid conflicts: `apt purge --autoremove cuda nvidia-* libnvidia-*`
+    - Warning: May break your PC. There may be better ways to do this.
+1. Install CUDA from the new repo (includes the NVIDIA driver): `apt install cuda`
+1. Setup path: In `/etc/environment`, append `:/usr/local/cuda/bin` to the end of the PATH list.
+
+### Docker Containers
+
+- Docker containers may run NVIDIA applications using the NVIDIA runtime for Docker.
+- **TODO**
+
+### DCGM
+
+- For monitoring GPU hardware and performance.
+- See the DCGM exporter for Prometheus for monitoring NVIDIA GPUs from Prometheus.
+
+## Programming
 
 See [CUDA (software engineering)](/config/se/general/cuda.md).
 
-### General Tools
+## Usage and Tools
 
 - Gathering system/GPU information with `nvidia-smi`:
     - Show overview: `nvidia-smi`

+ 10 - 5
config/linux-server/debian.md

@@ -10,6 +10,9 @@ Using **Debian 10 (Buster)**.
 
 ## Basic Setup
 
+If using automation to provision the system, only the "installation" part is necessary.
+If using a hypervisor, the VM may be turned into a template after the "installation" part, so that you only need to do the manual installation once and then clone the template when you need more VMs.
+
 ### Installation
 
 - Always verify the downloaded installation image after downloading it.
@@ -21,7 +24,9 @@ Using **Debian 10 (Buster)**.
     - Locale: United States UTF-8 (`en_US.UTF-8`).
     - Keymap: Your keyboard's keymap.
 - Use an FQDN as the hostname. It'll set both the shortname and the FQDN.
+    - If using automation to manage the system, this doen't matter.
 - Use separate password for root and your personal admin user.
+    - If using automation to manage the system, the passwords may be something temporary and the non-root user may be called e.g. `ansible` and used for automation.
 - System disk partitioning:
     - "Simple" system: Guided, single partition, use all available space.
     - "Complex" system: Manually partition, see [system storage](/config/linux-server/storage/#system-storage).
@@ -33,7 +38,7 @@ Using **Debian 10 (Buster)**.
 
 ### Reconfigure Clones
 
-If you didn't already configure this during the installation. Typically the case if cloning a template VMs or something.
+If you didn't already configure this during the installation, e.g. if cloning a template VMs or something.
 
 1. Check the system status:
     - Check for failed services: `systemctl --failed`
@@ -55,10 +60,10 @@ If you didn't already configure this during the installation. Typically the case
 ### Basic Configuration
 
 1. Packages:
-    - (Optional) Enable the `contrib` and `non-free` repo areas by setting `main contrib non-free` for every `deb`/`deb-src` in `/etc/apt/sources.list`.
+    - (Optional) Enable the `contrib` and `non-free` repo areas: `add-apt-repository <area>`
+        - Or by setting `main contrib non-free` for every `deb`/`deb-src` in `/etc/apt/sources.list`.
     - Update, upgrade and auto-remove.
-    - Install (essentials): `sudo ca-certificates`
-    - Install (extra): `man-db tree vim screen curl net-tools dnsutils moreutils htop iotop irqtop nmap`
+    - Install: `sudo ca-certificates software-properties-common man-db tree vim screen curl net-tools dnsutils moreutils htop iotop irqtop nmap`
     - (Optional) Install per-user tmpdirs: `libpam-tmpdir`
 1. (Optional) Configure editor (Vim):
     - Update the default editor: `update-alternatives --config editor`
@@ -152,7 +157,7 @@ This is used by default and is the simplest to use for simple setups.
 
 This is the systemd way of doing it and is recommended for more advanced setups as ifupdown is riddled with legacy/compatibility crap.
 
-1. Add a simple network config: Create `/etc/systemd/network/lan.network` based on [main.network](https://github.com/HON95/configs/blob/master/server/linux/networkd/main.network).
+1. Add a simple network config: Create `/etc/systemd/network/lan.network` based on [main.network](https://github.com/HON95/configs/blob/master/networkd/main.network).
 1. Disable/remove the ifupdown config: `mv /etc/network/interfaces /etc/network/interfaces.old`
 1. Enable the service: `systemctl enable --now systemd-networkd`
 1. Purge `ifupdown` and `ifupdown2`.

+ 6 - 0
config/linux-server/storage-zfs.md

@@ -271,6 +271,12 @@ The installation part is highly specific to Debian 10 (Buster). The backports re
     - One app per database.
     - Encode the environment and DMBS version into the dataset name, e.g. `theapp-prod-pg10`.
 
+## Troubleshooting
+
+**"cannot create 'pool': URI scheme is not supported"**:
+
+Reboot.
+
 ## Related Software
 
 ### zfs-auto-snapshot

+ 8 - 7
config/pc/applications.md

@@ -40,16 +40,17 @@ breadcrumbs:
 
 ### Config
 
-- (Linux) Disable middle mouse paste:
-    - Go to `about:config`.
-    - Set `middlemouse.paste` to false.
+- Disable middle mouse paste by setting `middlemouse.paste` to false in `about:config`.
+- Enable middle mouse "drag scrolling" by setting `general.autoScroll` to true in `about:config`.
+- Disable external media keys by setting `media.hardwaremediakeys.enabled` to false in `about:config`.
+- (Linux) Install missing language support: `apt install $(check-language-support)`
 
 ## Git
 
 ### Config
 
 - Location: `~/.gitconfig`
-- [Example](https://github.com/HON95/configs/blob/master/pc/common/gitconfig).
+- [Example](https://github.com/HON95/configs/blob/master/git/config).
 
 ## Nvidia Settings (Linux)
 
@@ -94,7 +95,7 @@ GUI for configuring gaming mice.
 ### Config
 
 - Location: `~/.ssh/config`
-- [Example](https://github.com/HON95/configs/blob/master/pc/common/ssh_config).
+- [Example](https://github.com/HON95/configs/blob/master/pc/ssh/config).
 
 ## Steam (Linux)
 
@@ -106,7 +107,7 @@ GUI for configuring gaming mice.
 - Location:
     - Global: `/etc/vim/vimrc`
     - User: `~/.vimrc`
-- [Example](https://github.com/HON95/configs/blob/master/pc/common/vimrc).
+- [Example](https://github.com/HON95/configs/blob/master/vim/vimrc).
 
 ## VS Code
 
@@ -130,7 +131,7 @@ GUI for configuring gaming mice.
 - Location:
     - Linux: `~/.config/Code/User/settings.json`
     - Windows: `%APPDATA%\Code\User\settings.json`
-- [Example](https://github.com/HON95/configs/blob/master/pc/common/vscode_settings.json).
+- [Example](https://github.com/HON95/configs/blob/master/pc/vscode/settings.json).
 
 ## ZSH (personal) (Linux)
 

+ 6 - 4
config/pc/kubuntu.md

@@ -37,11 +37,13 @@ breadcrumbs:
 1. Setup panels for all screens. Only show tasks for the current screen.
 1. Setup clipboard:
     - Open the clipboard settings from the taskbar.
+    - Select "ignore selection" to avoid copying when selecting text.
     - Set the history size to 1 (effectively disabling the history).
 1. Setup firewall:
-    - Remove other firewalls: `apt purge ufw firewalld`.
-    - Install `iptables iptables-persistent netfilter-persistent`.
-    - Create and run an IPTables script, e.g. [iptables.sh](https://github.com/HON95/configs/blob/master/pc/linux/iptables/iptables.sh).
+    - Remove other firewalls: `sudo apt purge ufw firewalld`.
+    - Install IPTables stuff: `sudo apt install iptables iptables-persistent netfilter-persistent`.
+    - (Alternative 1) Create an IPTables script (e.g. [iptables.sh](https://github.com/HON95/scripts/blob/master/linux/iptables/iptables.sh)).
+    - (Alternative 2) Run my preset (basics only, no SSH etc.): `curl https://raw.githubusercontent.com/HON95/scripts/master/linux/iptables/iptables.sh | sudo bash`
 1. Firefox:
     - Disable middle mouse paste by setting `middlemouse.paste` to false in `about:config`.
     - Enable middle mouse "drag scrolling" by setting `general.autoScroll` to true in `about:config`.
@@ -50,7 +52,7 @@ breadcrumbs:
 
 ### Extra
 
-1. Install applications: See [PC Appluications](/config/pc/applications/).
+1. Install applications: See [PC Applications](/config/pc/applications/).
 1. (Optional) Install encrypted DVD support:
     - Install: `sudo apt install libdvd-pkg && sudo dpkg-reconfigure libdvd-pkg`
     - Warning: Don't change the region if not necessary. It's typically limited to five changes.

+ 74 - 70
index.md

@@ -10,148 +10,152 @@ Random collection of config notes and miscellaneous stuff. _Technically not a wi
 
 ### General
 
-- [General Notes](config/general/general/)
-- [Linux General Notes](config/general/linux-general/)
-- [Linux Examples](config/general/linux-examples/)
-- [Computer Testing](config/general/computer-testing/)
+- [General Notes](/config/general/general/)
+- [Linux General Notes](/config/general/linux-general/)
+- [Linux Examples](/config/general/linux-examples/)
+- [Computer Testing](/config/general/computer-testing/)
 
 ### Authentication, Authorization and Accounting (AAA)
 
-- [Kerberos](config/aaa/kerberos/)
+- [Kerberos](/config/aaa/kerberos/)
 
 ### Automation
 
-- [Ansible](config/automation/ansible/)
-- [Puppet](config/automation/puppet/)
+- [Ansible](/config/automation/ansible/)
+- [Puppet](/config/automation/puppet/)
 
 ### Computers
 
-- [Dell OptiPlex Series](config/computers/dell-optiplex/)
-- [Dell PowerEdge Series](config/computers/dell-poweredge/)
-- [HP ProLiant](config/computers/hp-proliant/)
-- [PCs](config/computers/pcs/)
+- [Dell OptiPlex Series](/config/computers/dell-optiplex/)
+- [Dell PowerEdge Series](/config/computers/dell-poweredge/)
+- [HP ProLiant](/config/computers/hp-proliant/)
+- [PCs](/config/computers/pcs/)
 
 ### Game Servers
 
-- [Counter-Strike: Global Offensive (CS:GO)](config/game-server/csgo/)
-- [Minecraft (Bukkit)](config/game-server/minecraft-bukkit/)
-- [Team Fortress 2 (TF2)](config/game-server/tf2/)
+- [Counter-Strike: Global Offensive (CS:GO)](/config/game-server/csgo/)
+- [Minecraft (Bukkit)](/config/game-server/minecraft-bukkit/)
+- [Team Fortress 2 (TF2)](/config/game-server/tf2/)
 
 ### HPC
 
-- [CUDA](config/hpc/cuda/)
-- [Open MPI](config/hpc/openmpi/)
-- [Slurm Workload Manager](config/hpc/slurm/)
+- [CUDA](/config/hpc/cuda/)
+- [Open MPI](/config/hpc/openmpi/)
+- [Slurm Workload Manager](/config/hpc/slurm/)
 
 ### IoT & Home Automation
 
-- [Raspberry Pi](config/iot-ha/raspberry-pi/)
-- [Home Assistant](config/iot-ha/home-assistant/)
+- [Raspberry Pi](/config/iot-ha/raspberry-pi/)
+- [Home Assistant](/config/iot-ha/home-assistant/)
 
 ### Linux Server
 
-- [Debian](config/linux-server/debian/)
-- [Applications](config/linux-server/applications/)
-- [Storage](config/linux-server/storage/)
-- [Storage: ZFS](config/linux-server/storage-zfs/)
-- [Storage: Ceph](config/linux-server/storage-ceph/)
-- [Networking](config/linux-server/networking/)
+- [Debian](/config/linux-server/debian/)
+- [Applications](/config/linux-server/applications/)
+- [Storage](/config/linux-server/storage/)
+- [Storage: ZFS](/config/linux-server/storage-zfs/)
+- [Storage: Ceph](/config/linux-server/storage-ceph/)
+- [Networking](/config/linux-server/networking/)
 
 ### Media
 
-- [Media Ripping](config/media/ripping/)
-- [Video Streaming](config/media/streaming/)
+- [Media Ripping](/config/media/ripping/)
+- [Video Streaming](/config/media/streaming/)
 
 ### Network
 
 #### General
 
-- [Routing](config/network/routing/)
-- [Switching](config/network/switching/)
-- [WLAN](config/network/wlan/)
-- [Security](config/network/security/)
+- [Routing](/config/network/routing/)
+- [Switching](/config/network/switching/)
+- [WLAN](/config/network/wlan/)
+- [Security](/config/network/security/)
 
 #### Specific
 
-- [Brocade FastIron Switches](config/network/brocade-fastiron-switches/)
-- [Cisco Hardware](config/network/cisco-hardware/)
-- [Cisco IOS General](config/network/cisco-ios-general/)
-- [Cisco IOS Routers](config/network/cisco-ios-routers/)
-- [Cisco IOS Switches](config/network/cisco-ios-switches/)
-- [FS FSOS Switches](config/network/fs-fsos-switches/)
-- [Juniper Hardware](config/network/juniper-hardware/)
-- [Juniper Junos General](config/network/juniper-junos-general/)
-- [Juniper Junos Switches](config/network/juniper-junos-switches/)
-- [Linksys LGS Switches](config/network/linksys-lgs/)
-- [Linux Switching & Routing](config/network/linux/)
-- [pfSense](config/network/pfsense/)
-- [TP-Link JetStream Switches](config/network/tplink-jetstream-switches/)
-- [Ubiquiti UniFi Controllers](config/network/ubiquiti-unifi-controllers/)
-- [Uniquiti UniFi Access Points](config/network/ubiquiti-unifi-aps/)
-- [VyOS](/config/network/vyos/)
+- [Brocade FastIron Switches](/config/network/brocade-fastiron-switches/)
+- [Cisco Hardware](/config/network/cisco-hardware/)
+- [Cisco IOS General](/config/network/cisco-ios-general/)
+- [Cisco IOS Routers](/config/network/cisco-ios-routers/)
+- [Cisco IOS Switches](/config/network/cisco-ios-switches/)
+- [FS FSOS Switches](/config/network/fs-fsos-switches/)
+- [Juniper Hardware](/config/network/juniper-hardware/)
+- [Juniper Junos General](/config/network/juniper-junos-general/)
+- [Juniper Junos Switches](/config/network/juniper-junos-switches/)
+- [Linksys LGS Switches](/config/network/linksys-lgs/)
+- [Linux Switching & Routing](/config/network/linux/)
+- [pfSense](/config/network/pfsense/)
+- [TP-Link JetStream Switches](/config/network/tplink-jetstream-switches/)
+- [Ubiquiti UniFi Controllers](/config/network/ubiquiti-unifi-controllers/)
+- [Uniquiti UniFi Access Points](/config/network/ubiquiti-unifi-aps/)
+- [VyOS](//config/network/vyos/)
 
 ### PC
 
-- [Kubuntu](config/pc/kubuntu/)
-- [Windows](config/pc/windows/)
-- [PC Applications](config/pc/applications/)
+- [Kubuntu](/config/pc/kubuntu/)
+- [Windows](/config/pc/windows/)
+- [PC Applications](/config/pc/applications/)
 
 ### Power
 
-- [APC PDUs](config/power/apc-pdus/)
+- [APC PDUs](/config/power/apc-pdus/)
 
 ### Virtualization & Containerization
 
-- [Docker](config/virt-cont/docker/)
-- [libvirt & KVM](config/virt-cont/libvirt-kvm/)
-- [Proxmox VE](config/virt-cont/proxmox-ve/)
+- [Docker](/config/virt-cont/docker/)
+- [libvirt & KVM](/config/virt-cont/libvirt-kvm/)
+- [Proxmox VE](/config/virt-cont/proxmox-ve/)
 
 ## Information Technology
 
 ### Network
 
-- [IPv4](it/network/ipv4/)
-- [IPv6](it/network/ipv6/)
-- [Network Architecture](it/network/architecture/)
-- [Switching](it/network/switching/)
-- [Routing](it/network/routing/)
-- [Wireless Basics](it/network/wireless-basics/)
-- [WLAN](it/network/wlan/)
+- [IPv4](/it/network/ipv4/)
+- [IPv6](/it/network/ipv6/)
+- [Network Architecture](/it/network/architecture/)
+- [Switching](/it/network/switching/)
+- [Routing](/it/network/routing/)
+- [Wireless Basics](/it/network/wireless-basics/)
+- [WLAN](/it/network/wlan/)
 
 ### Services
 
-- [Email](it/services/email/)
-- [DNS](it/services/dns/)
+- [Email](/it/services/email/)
+- [DNS](/it/services/dns/)
 
 ## Media
 
 ### Audio
 
-- [Audio Basics](media/audio/basics/)
+- [Audio Basics](/media/audio/basics/)
 
 ## Software Engineering
 
 ### General
 
-- [Database Management Systems (DBMSes)](se/general/dbmses/)
-- [Software Licensing](se/general/licensing/)
+- [Database Management Systems (DBMSes)](/se/general/dbmses/)
+- [Software Licensing](/se/general/licensing/)
 
 ### HPC
 
-- [CUDA](se/general/cuda/)
+- [CUDA](/se/general/cuda/)
 
 ### Web
 
-- [Web Security](se/web/security/)
+- [Web Security](/se/web/security/)
 
 ## Guides
 
 ### Mining
 
-- [Headless Linux ETH Mining](guides/mining/headless-linux-eth-mining/)
+- [Headless Linux ETH Mining](/guides/mining/headless-linux-eth-mining/)
 
 ### Network
 
-- [Juniper EX3300 Fan Mod](guides/network/juniper-ex3300-fanmod/)
+- [Juniper EX3300 Fan Mod](/guides/network/juniper-ex3300-fanmod/)
+
+## Miscellanea
+
+- [Betzy (Supercomputer)](/miscellanea/betzy/)
 
 {% include footer.md %}

+ 87 - 0
miscellanea/betzy.md

@@ -0,0 +1,87 @@
+---
+title: Betzy (Supercomputer)
+breadcrumbs:
+- title: Miscellanea
+---
+{% include header.md %}
+
+Norways most powerful supercomputer from 2020, managed by UNINETT Sigma2.
+
+## Specifications
+
+A mix of general XH2000 specifications and specific Betzy specifications.
+
+- Betzy overall specifications \[1\]\[2\]\[5\]\[9\]\[10\]:
+    - System: Atos BullSequana XH2000 with X2410 (AMD) and X2415 (A100) blades.
+    - OS: RHEL
+    - Compute nodes: 1’344x X2410 (AMD) + 4x X2415 (A100)
+    - CPUs total (excluding A100 nodes): 2’688 sockets, 86’016 cores, 172’032 threads
+    - Memory: 336TiB total (excluding A100 nodes)
+    - Storage: 7.8PB (2.5PB before 2021 upgrade), DNN powered, Lustre, 51GB/s bandwidth, 500k+ metadata OPS
+    - Interconnect topology: DragonFly+ topology
+    - Queueing system: Slurm
+    - Footprint: 14.78m2 (before 2021 upgrade)
+    - Power: 952kW, 95% of heat captured to water (before 2021 upgrade)
+    - Cooling: Liquid cooled
+- CPU specifications (excluding A100 nodes) \[1\]\[2\]\[3\]:
+    - AMD Epyc 7742
+    - 64 cores, 128 threads
+    - Clock: 2.25GHz base, 3.4GHz max boost
+    - PCIe: 4.0, x128
+    - Memory: DDR4, 8 channels, 3200MHz, 204.8GB/s per socket BW
+    - Supports AVX and AVX2.
+- Compute node specifications (excluding A100 nodes) \[1\]\[2\]:
+    - CPUs per node: 2 sockets, 128 kjerner, 256 threads
+    - Memory: 256GiB, split into 8 NUMA nodes
+    - Storage: 3x SATA or NVMe M.2 drives
+    - NIC: InfiniBand HDR 100
+- Blade specifications \[9\]:
+    - Betzy uses mainly X2410 blades (AMD), but also 4x X2415 blades (A100) (after the 2021 upgrade) \[10\].
+    - Size: 1U
+    - Cooling: Fanless, active liquid cooling.
+    - All blades types (both used and not used):
+        - X2410: 3x AMD EPYC Rome/Milan nodes (side-by-side) (6 CPUs total).
+        - X2415: 2x AMD EPYC Rome/Milan CPUs and 4x Nvidia A100 SXM4 GPUs (single node).
+        - X1120: 3x Intel Xeon nodes (side-by-side) (6 CPUs total).
+        - X1125: 2x Intel CPUs and 4x Nvidia V100 SXM2 GPUs (single node).
+- Cabinet specifications \[8\]\[9\]:
+    - Number of blades: 4-20 in front, 4-12 in back
+    - Management switches:
+        - Up to 2.
+        - Up to 48 1Gb/s or 10Gb/s ports.
+    - Interconnect switches:
+    - Up to 10.
+    - Infiniband HDR100: 80 ports, 100Gb/s (Betzy)
+    - Alternative technologies:
+        - Bull eXascale Interconnect (BXI): 48 ports, 100Gb/s
+        - High-speed Ethernet: Up to 48 ports, up to 100Gb/s
+    - Topology: DragonFly+ (Betzy)
+    - Alternative topologies:
+        - Full Fat Tree
+    - PSU: 6x 15kW shelves
+    - Power input: 3x 63A 3-phase 400V (for EU)
+    - Cooling:
+        - Direct Liquid Cooling (DLC)
+        - Hydraulic chassis (HYC)
+        - Primary (external) loop connected to customer water loop.
+        - Secondary (internal) loop connected to blades, management switches, interconnect switches and PSUs.
+
+## History
+
+- 7 December 2020: Inauguration. \[11\]
+- April 2021 (unknown date): Four new X2415 blades (A100) and 5.3PB more storage (from 2.5PB to 7.8PB). \[10\]
+
+## References
+
+- \[1\] UNINETT Sigma2. "Betzy." (Accessed 2020-09-03.) https://documentation.sigma2.no/hpc_machines/betzy.html
+- \[2\] UNINETT Sigma2. "Betzy Pilot Projects." (Accessed 2020-09-03.) https://documentation.sigma2.no/hpc_machines/betzy/betzy_pilot.html
+- \[3\] SPEC CPU 2017 Integer Rate Result for Atos BullSequana XH2000 (1 socket)
+- \[4\] AMD EPYC 7742. (Accessed 2020-09-03.) https://www.amd.com/en/products/cpu/amd-epyc-7742
+- \[5\] Atos. "Atos to deliver most powerful supercomputer in Norway to national e-infrastructure provider Uninett Sigma2." (Accessed 2020-09-03.) https://atos.net/en/2019/press-release_2019_06_06/atos-to-deliver-most-powerful-supercomputer-in-norway-to-national-e-infrastructure-provider-uninett-sigma2
+- \[6\] Atos. "Atos expands BullSequana X supercomputer range to include AMD processors." (Accessed 2020-09-03.) https://atos.net/en/2018/news_2018_11_12/atos-expands-bullsequana-x-supercomputer-range-include-amd-processors
+- \[8\] Atos. "BullSequana XH2000 brochure." (Accessed 2020-09-03.) https://atos.net/wp-content/uploads/2019/11/BullSequana_XH2000_Brochure_Atos.pdf
+- \[9\] Atos. "BullSequana XH2000 features." (Accessed 2020-09-03.) https://atos.net/wp-content/uploads/2020/07/BullSequanaXH2000_Features_Atos_supercomputers.pdf
+- \[10\] Digi.no. "Sigma2 skal utvide to av de norske superdatamaskinene." (Accessed 2021-04-21.) https://www.digi.no/artikler/sigma2-skal-utvide-to-av-de-norske-superdatamaskinene/509303
+- \[11\] UNINETT Sigma2. "Betzy Inauguration." (Accessed 2021-04-21.) https://www.sigma2.no/betzy-inauguration
+
+{% include footer.md %}

+ 108 - 0
se/cuda-tmp.md

@@ -0,0 +1,108 @@
+## General
+
+- Introduced by NVIDIA in 2006. While GPU compute was possible before through hackish methods, CUDA provided a programming model for compute which included e.g. thread blocks, shared memory and synchronization barriers.
+- Modern NVIDIA GPUs contain _CUDA cores_, _tensor cores_ and _RT cores_ (ray tracing cores). Tensor cores may be accessed in CUDA through special CUDA calls, but RT cores are (as of writing) only accessible from Optix and not CUDA.
+- The _compute capability_ describes the generation and supported features of a GPU.
+
+### Mapping the Programming Model to the Execution Model
+
+- The programmer decides the grid size (number of blocks and threads therein) when launching a kernel.
+- The device has a constant number of streaming multiprocessors (SMs) and CUDA cores (not to be confused with tensor cores or RT cores).
+- Each kernel launch (or rather its grid) is executed by a single GPU. To use multiple GPUs, multiple kernel launches are required by the CUDA application.
+- Each thread block is executed by a single SM and is bound to it for the entire execution. Each SM may execute multiple thread blocks.
+- Each CUDA core within an SM executes a thread from a block assigned to the SM.
+- **TODO** Warps and switches.
+
+## Programming
+
+### General
+
+- Branching:
+    - **TODO** How branching works and why it's bad.
+
+### Thread Hierarchy
+
+- Grids consist of a number of blocks and blocks concist of a number of threads.
+- Threads and blocks are indexed in 1D, 2D or 3D space (separately), which threads may access through the 3-compoent vectors `blockDim`, `blockIdx` and `threadIdx`.
+- The programmer decides the number of grids, blocks and threads to use, as well as the number of dimensions to use, for each kernel invocation.
+- The number of threads per block is typically limited to 1024.
+- See the section about mapping it to the execution model for a better understanding of why it's organized this way.
+
+### Memory Hierarchy
+
+- **TODO**
+- Memories (local to global):
+    1. **TODO** Fix, these names are wrong.
+    1. Registers.
+    1. Shared memory (block cache).
+    1. Read-only memories.
+    1. SM cache.
+    1. Global memory.
+
+### Synchronization
+
+- **TODO**
+- `__syncthreads` (device) provides block level barrier synchronization.
+- Grid level barrier synchronization is currently not possible through any native API call.
+- `cudaDeviceSynchronize`/`cudaStreamSynchronize` (host) blocks until the device or stream has finished all tasks (kernels/copies/etc.).
+
+### Measurements
+
+#### Time
+
+- To measure the total duration of a kernel invocation or memory copy on the CPU side, measure the duration from before the call to after it, including a `cudaDeviceSynchronize()` if the call is asynchronous.
+- To measure durations inside a kernel, use the CUDA event API (as used in this section hereafter).
+- Events are created and destroyed using `cudaEventCreate(cudaEvent_t *)` and `cudaEventDestroy(cudaEvent_t *)`.
+- Events are recorded (captured) using `cudaEventRecord`. This will capture the state of the stream it's applied to. The "time" of the event is when all previous tasks have completed and not the time it was called.
+- Elapsed time between two events is calculated using `cudaEventElapsedTime`.
+- Wait for an event to complete (or happen) using `cudaEventSynchronize`. For an event to "complete" means that the previous tasks (like a kernel) is finished executing. If the `cudaEventBlockingSync` flag is set for the event, the CPU will block while waiting (which yields the CPU), otherwise it will busy-wait.
+
+#### Bandwidth
+
+- To calculate the theoretical bandwidth, check the hardware specifications for the device, wrt. the memory clock rate and bus width and DDR.
+- To measure the effective bandwidth, divide the sum of the read and written data by the measured total duration of the transfers.
+
+#### Computational Throughput
+
+- Measured in FLOPS (or "FLOP/s" or "flops"), separately for the type of precision (half, single, double).
+- Measured by manually analyzing how many FLOPS a compoind operation is and then multiplied by how many times it was performed, divided by the total duration.
+- Make sure it's not memory bound (or label it as so).
+
+### Unified Virtual Addressing (UVA)
+
+- Causes CUDA to use a single address space for allocations for both the host and devices (if the host supports it).
+- Allows using `cudaMemcpy` without having to spacify in which device (or host) and memory the pointer exists in.
+- Allows _zero-copy_ memory where the GPU can access pinned/managed host memory over the PCIe interconnect (including the high latency for accessing off-device memory).
+
+### Unified Memory
+
+- Depends on the older UVA, which provides a single address space for both the host and devices, as well as zero-copy memory.
+- Virtually combines the pinned CPU/host memory and the GPU/device memory such that explicit memory copying between the two is no longer needed. Both the host and device may access the memory through a single pointer and data is automatically migrated (prefetched) between the two instead of demand-fetching it each time it's used (as for UVA).
+- Data migration happens automatically at page-level granularuity and follows pointers in order to support deep copies. As it automatically migrates data to/from the devices instead of accessing it over the PCIe interconnect on demand, it yields much better performance than UVA.
+- As Unified Memory uses paging, it implicitly allows oversubscribing GPU memory.
+- Keep in mind that GPU page faulting will affect kernel performance.
+- Unified Memory also provides support for system-wide atomic memory operations, for multi-GPU cooperative programs.
+- Explicit memory management may still be used for optimization purposes, although use of streams and async copying is typically needed to actually increase the performance.
+- `cudaMallocManaged` and `cudaFree` are used to allocate and deallocate managed memory.
+- Since unified memory removes the need for `cudaMemcpy` when copying data back to the host after the kernel is finished, you may use e.g. `cudaDeviceSynchronize` to wait for the kernel to finish before accessing the managed data.
+- While the Kepler and Maxwell architectures support a limited version of Unified Memory, the Pascal architecture is the first with hardware support for page faulting and migration via its Page Migration Engine. For the pre-Pascal architectures, _all_ managed data is automatically copied to the GPU right before lanuching a kernel on it, since they don't support page faulting for managed data currently present on the host or another device. This also means that Pascal and later includes memory copy delays in the kernel run time while pre-Pascal does not as everything is migrated before it begins executing (increasing the overall application runtime). This also prevents pre-Pascal GPUs from accessing managed data from both CPU and GPU concurrently (without causing segfaults) as it can't assure data coherence (although care must still be taken to avoid race conditions and data in invalid states for Pascal and later GPUs).
+- Explicit prefetching may be used to assist the data migration through the `cudaMemPrefetchAsync` call.
+
+### Streams
+
+- **TODO**
+- If no stream is specified, it defaults to stream 0, aka the "null stream".
+
+## Tools
+
+**TODO** Add stuff from other document.
+
+### Nsight
+
+- For debugging and profiling applications.
+- Comes as multiple variants:
+    - Nsight Systems: For general applications. Should also be used for CUDA and graphics applications.
+    - Nsight Compute: For CUDA applications.
+    - Nsight Graphics: For graphical applications.
+    - IDE integration.
+- Replaces nvprof.