HON95 2 years ago
parent
commit
82659e5f7c
4 changed files with 154 additions and 16 deletions
  1. 15 9
      hpc/cuda.md
  2. 1 0
      index.md
  3. 21 7
      media/dmx512.md
  4. 117 0
      se/gcc.md

+ 15 - 9
hpc/cuda.md

@@ -125,9 +125,11 @@ See the programming section for more info about them.
 
 #### GPUDirect Async
 
-- **TODO** Unknown if this is implemented and used.
-- Provides inter-node GPU control communication, to avoid having the CPU poll the GPU and HCA and schedule the next action.
-- Meant to be used together with GPUDirect RDMA (for inter-node GPU data communication).
+- Provides inter-node GPU control communication, to avoid having the CPU poll the GPU and HCA and schedule the next action when ready (i.e. removing the CPU from the critical path).
+- The CPU still _prepares_ the network communication, but the GPU now _triggers_/_schedules_ it when ready.
+- Meant to be used together with GPUDirect RDMA, such that the GPU tells the HCA directly to "do RDMA" to/from GPU memory.
+- As an example, a ping-pong test over InfiniBand would previously require e.g. MPI send/receive in CPU code. With GPUDirect Async, all the send/receive may be moved into the GPU kernel, replacing MPI with e.g. LibGDSync/LibMP.
+- [LibGDSync](https://github.com/gpudirect/libgdsync) implements GPUDirect Async support on InfiniBand verbs. [LibMP](https://github.com/gpudirect/libmp) is a technology demonstrator based on LibGDSync.
 
 #### GPUDirect Storage
 
@@ -323,10 +325,15 @@ See the programming section for more info about them.
 
 ### CUDA-Aware MPI
 
-- Allows using CUDA pointers in MPI communication, e.g. to transfer memory directly to/from device memory instead of copying it to the host first.
-- Requires CUDA 5 and a Kepler-class GPU or newer. May require a Tesla or Quadro GPU (at least for Kepler).
-- Requires a supported MPI implementation. e.g. Open MPI 1.7 or later.
-- Uses GPUDirect RDMA internally.
+- Provides memory transfers directly to/from device/GPU memory instead of copying it through the host/CPU.
+- Based on UVA plus GPUDirect P2P (intra-node) and GPUDirect RDMA (inter-node).
+- Requirements:
+    - Requires CUDA 5 and a Kepler-class GPU or newer. May require a Tesla or Quadro GPU (at least for Kepler).
+    - Requires a supported MPI implementation. e.g. Open MPI 1.7 or later.
+    - See the GPUDirect P2P and RDMA sections for info and requirements. (GPUDirect RDMA/P2P is optimal but not required.)
+- Implicitly allows using any UVA pointers directly in MPI calls, regardless of where the allocation resides.
+- If GPUDirect P2P or RDMA is _not_ available, the buffer will be copied through host memory, typically through both a CUDA (pinned) and a fabric buffer.
+- If the MPI implementation is _not_ CUDA-aware, the buffer will be copied through host memory, typically through a CUDA buffer, a host buffer and a fabric buffer. An explicit `cudaMemcpy` is required. CUDA streams and async copying should be used.
 
 ### Miscellanea
 
@@ -368,8 +375,7 @@ See the programming section for more info about them.
 
 - Gathering system/GPU information with `nvidia-smi`:
     - Show overview: `nvidia-smi`
-    - Show topology matrix: `nvidia-smi topo --matrix`
-    - Show topology info: `nvidia-smi topo <option>`
+    - Show topology info: `nvidia-smi topo <option>` (e.g. `--matrix`)
     - Show NVLink info: `nvidia-smi  nvlink --status -i 0` (for GPU #0)
     - Monitor device stats: `nvidia-smi dmon`
 - To specify which devices are available to the CUDA application and in which order, set the `CUDA_VISIBLE_DEVICES` env var to a comma-separated list of device IDs.

+ 1 - 0
index.md

@@ -138,6 +138,7 @@ _(Alphabetically sorted, so it might seem a bit strange.)_
 
 - [Data Stuff](/se/data/)
 - [Database Management Systems (DBMSes)](/se/dbmses/)
+- [GNU Compiler Collection (GCC)](/se/gcc/)
 - [CUDA](/se/go/)
 - [Licensing](/se/licensing/)
 - [Web Security](/se/web-security/)

+ 21 - 7
media/dmx512.md

@@ -7,10 +7,10 @@ breadcrumbs:
 
 ## General
 
-- Used for controlling stage equipment like RGB(WA) lights and stuff.
-- "DMX512" means digital multiplex with 512 addresses.
+- Used for controlling stage equipment like RGB(WA&hellip;) lights, moving heads, fog/haze/smoke machines and stuff.
+- "DMX512" means digital multiplex with 512 addresses/elements. The ANSI standard, based on the older USITT standard "DMX512", is known as "DMX512-A".
 - RDM (remote device management) is a separate protocol used to remotely configure "menu settings" (like the DMX address) on devices.
-- Art-Net is a separate protocol used to transfer DMX512 and RDM over an Ethernet network. (sACN is similar.)
+- Art-Net is a separate protocol used to transfer DMX512 and RDM over an IP network. sACN is a related protocol, often used together with Art-Net.
 
 ## Protocol
 
@@ -28,14 +28,28 @@ breadcrumbs:
 - Units are addressed using a pre-configured address. Multiple units may use the same address if they should be controlled using the same address and are roughly channel-compatible.
 - When connecting multiple units to the same controller, the units must be daisy chained together (unless using some fancy splitter). A maximum of 32 units may be chained (according to the specification). Units typically have one input and one output for proper daisy chaining (even if they have multiple outputs, you should only use one).
 - Long lines (sum of all cables in the daisy chain, about 150-300m) should generally always be terminated using a DMX terminator for best possible signal integrity. For short runs it will often work without a terminator (until it won't). This is in order to reduce reflected signals (inter-symbol interference from signals bouncing back from the end) and noise. The terminator is a connector containing a 120ohm resistor (depending on the characteristic impedance of the cable, but typically 120ohm).
-- Although both audio and DMX (may) use cables with XLR3 connectors (DMX should use XLR5, though), DMX should use cables specific to DMX to meet the DMX/RS-485 cable requirements (i.e. not mic cables).
+- Although both audio and DMX (may) use cables with XLR3 connectors (DMX should use XLR5, though), DMX should use cables specific to DMX to meet the DMX/RS-485 cable requirements (i.e. not mic/speaker cables).
 - For less expensive cabling for permanent setups, Cat5e or Cat6 may be used instead, with 1-4 universes per cable. Not that it should be terminated according to ESTA's DMX512-over-Cat5 specification, see [this](https://support.etcconnect.com/ETC/FAQ/DMX_Over_Cat5).
 - Using proper cabling, it may sometimes achieve data rates over 10Mb/s and distances over 1000m.
-- To run multiple DMX universes over a large area, Art-Net (or sACN) may be used to run DMX over Ethernet.
+- To run multiple DMX universes over a large area, Art-Net (or sACN) may be used to run DMX over IP. Special entertainment-class switches are recommended for this, but normal unmanaged and managed switches are typically good enough.
 
-## RDM
+## Art-Net 4
 
-- "RDM" means remote device management and is a protocol used to remotely configure "menu settings" (like the DMX address) on devices.
+- Used for sending DMX512 and RDM over Ethernet/IP/UDP (port 6454).
+- A _node_ is a device that translates to/from DMX512. A _controller_ is a device that controls/monitors fixtures/devices.
+- Supports 32 768 universes (as of v4). The 15-bit _port-address_ attribute consists of te _net_ (bits 14–8), the _sub-net_ (bits 7–4) and the _universe_ (bits 3–0). Each node should generally use only a single net and sub-net value, with 16 universes available.
+- The IP address is either derived from the MAC address or assigned using DHCP (if capable). If using DHCP, the subnet may be routed. If static/deterministic, subnet `2.0.0.0/8` or `10.0.0.0/8` will be used (controllers should use both to discover all devices).
+- Controllers use the `ArtPoll` packet as a directed broadcasts to discover other controllers and nodes, every 2.5–3 seconds. It may optionally target a range of port-addresses. Devices should wait a random delay in 0–1 seconds before answering with an `ArtPollReply` packet to avoid congestion.
+- Controllers may reconfigure the IP address and port-addresses of nodes (if supported).
+- The `ArtDmx` packet is used to send DMX512 data to/from devices/controllers. DMX outputs should periodically retransmit the last DMX frame it received. DMX inputs with active, non-changing DMX input should periodically retransmit the last DMX frame into Art-Net. `ArtDmx` should always use unicast, with universe subscribers discovered from `ArtPoll` replies.
+- The `ArtSync` packet may optionally be used to synchronize multiple universes, e.g for use with video. It uses broadcast, unlike `ArtDmx` which uses unicast to each node. Nodes boot into non-synchronous mode, but switch to synchronous mode when the first `ArtSync` packet is received. In sync mode, nodes must buffer `ArtDmx` packets until receiving the next `ArtSync` packet. After 4 seconds of not receiving `ArtSync` packets, sync nodes should switch back to non-sync mode.
+- Node DMX inputs are all enabled by default, but can be disabled by the controller to avoid pointless Art-Net traffic.
+- Node firmware may be upgraded over Art-Net.
+- Supports Remote Device Management (RDM). Nodes individually handles RDM discovery (full and incremental) and maintain their own RDM device lists (`ArtTodData` packets are automatically broadcast on any table of devices (TOD) changes). Input gateways are used for DMX controllers to query node device lists out of Art-Net and back into DMX/RDM, by querying Art-Net nodes (with a `ArtTodRequest` broadcast packet) and maintaining its own device list built from other nodes' device lists. Art-Net controllers acts the same way as input gateways. RDM get and set commands may be used from the controller over Art-Net (using a `ArtRdm` and `ArtRdmSub` packets).
+
+## Remote Device Management (RDM)
+
+- Used to remotely configure "menu settings" (like the DMX address) on devices.
 - This is extremely useful both for fixing settings on inaccessible devices, as well as making it generally more practical to configure devices.
 - Electrically, RDM is run on the same cable as DMX512 and transmits data between DMX signals/commands on the same data wires.
 - All of the controller, devices and eventual splitters/repeaters need to support RDM in order for it to work.

+ 117 - 0
se/gcc.md

@@ -0,0 +1,117 @@
+---
+title: GNU Compiler Collection (GCC)
+breadcrumbs:
+- title: Software Engineering
+---
+{% include header.md %}
+
+An optimizing compiler for C, C++ etc.
+
+The notes below mainly apply to C/C++, unless otherwise stated.
+
+## Usage
+
+- Compilation (C++): `g++ ${CPPFLAGS} ${CXXFLAGS} -c -o file1.o file1.cpp`
+- Linking (C++): `g++ ${LDFLAGS} ${LDLIBS} -o app file1.o file2.o`
+
+## Common Options
+
+- (Note) Flags are typically activated with `-f<name>` and deactivated with `-fno-<name>`.
+- Set language standard: `std=<standard>`
+    - For specifying C or C++, use the `gcc` or `g++` compiler executables, respectively.
+    - For C, use e.g. `std=gnu11`.
+    - For C++, use e.g. `std=c++17`.
+- Set optimization level with `-O<level>`:
+    - `-O0` (default): No optimization (almost). Useful for debugging to produce predictable code.
+    - `-O1`/`-O`: Performance optimization. Reduce code sized and execution time.
+    - `-O2`: Performance optimization, extends `-O1`.
+    - `-O3`: Performance optimization, extends `-O1`.
+    - `-Os`: Space optimization, extends `-O2`, but disables flags that increase code size.
+    - `-Ofast`: Based on `-O3`, but enables extra non-standards-compliant flags like `-ffast-math` for even better (non-standards-compliant) performance.
+    - `-Og`: Optimize for debugging. Gives fast compilation and code for better debugging. Similar to `-O0` and `-01` wrt. optimizing flags, except those that would degrade debugging experience.
+- Set architecture to produce code for: `-march=<cpu-type>` (example)
+    - Architecture `native` means whatever architecture the current system is using.
+    - Supports more generic architectures like `x86-64` as well as more specific ones like `skylake`.
+    - Use this to optimize for the specific architecture, e.g. for HPC applications.
+    - Do not use this if the executable is supposed to run on systems with other CPU architectures.
+    - Implies `-mtune=<cpu-type>`, for tuning the code for the given architecture.
+- **TODO**: `-matomic`
+- Enable OpenMP support: `-fopenmp`
+    - For parallelizing code, using preprocessor directives (`#pragma omp <...>`).
+- Position-independent code: `-fpic`
+    - This avoids using absolute addresses for jumps etc.
+    - Shared libraries should enable this. For other applications, it doesn't matter.
+- Vectorize loops: `-ftree-vectorize`
+    - Attempt to use AWX instructions to vectorize loops of math operations.
+    - Should be used with `-march=native` to make sure AWX is supported.
+    - Implied by `-O3` and `-Ofast`.
+- Enable fast but non-IEEE-compliant math: `-ffast-math`
+    - Enable this if the program doesn't depend on an exact IEEE math implementation to produce correct results.
+    - This may significantly improve floating-point performance, due to e.g. the restructuring of calculations which may produce different rounding errors.
+    - Implied by `-Ofast`.
+- Omit pointer frame: `-fomit-frame-pointer`
+    - Avoids storing the pointer frame in a register for functions that don't need it (most functions, especially small ones).
+    - Reduces code size and register usage.
+    - Makes debugging very hard.
+- Disable C++ exceptions (if not wanted): `-fno-exceptions`
+    - If you want code without exception support, just disable it. Some people have strong opinions about whether exceptions is a good or bad feature.
+
+### Warning Options
+
+- Enable common warnings: `-Wall -Wextra`
+- Enable extra warnings: `-Wextra`
+- Enable warnings for strict ISO C/C++ compliance: `-Wpedantic`
+- Treat all warnings as errors: `-Werror`
+    - Alternatively, treat specific warnings as errors with `-Werror=<warning>` (e.g. `-Werror=switch` for `-Wswitch`).
+
+### Hardening Options
+
+These flags should be used with applications with insafe input. For HPC applications which use trusted input and require maximum performance, most of these flags should be disabled (not specified).
+
+- Add string and memory overflow protection: `-D_FORTIFY_SOURCE=2` (or `1`)
+    - Adds compile-time and run-time checks to protect against buffer overflows in memory and string functions.
+    - Alternatively, use `-D_FORTIFY_SOURCE=1` to only add compile-time checks.
+    - Compile-time checks validate operations on constant-size data.
+    - Run-time checks validate operations on variable-size data, mainly by replacing functions like `memcpy` with build-in function `__memcpy_chk`.
+- Add extra glibc error checking: `-D_GLIBCXX_ASSERTIONS`
+    - Enables precondition assertions for e.g. checking string bounds and checking for null pointers when dereferencing smart pointers.
+- Add stack smash protection: `-fstack-protector-strong`
+    - Adds run-time checks to protect against stack smashing attacks.
+    - `-fstack-protector-strong` adds more protection than `-fstack-protector-all` and `-fstack-protector`.
+    - Use for programs with unsafe input (like servers and multiplayer games), disable for e.g. HPC applications which reqire max performance.
+- Add stack clash protection: `-fstack-clash-protection`
+    - Adds code to prevent stack clash style attacks.
+- Add control flow integrity protection: `-fcf-protection`
+    - Prevents unexpected jump targets (divergent control flow).
+    - For newer Intel processors, this uses Intel Control-flow Enforcement Technology (CET). (Specifying `-mcet` is not required to use this.)
+- Detect and reject underlinking: `-Wl,-z,defs` (linker)
+- Disable lazy binding: `-Wl,-z,now` (linker)
+- Read-only segments after relocation: `-Wl,-z,relro` (linker)
+- Enable full address space layout randomization (ASLR): `-fpie -fpic -shared` (compiler) `-Wl,-pie` (linker)
+    - This may reqire other options and run-time system features, so look it up if you need it.
+- Disable text relocation for shared libraries: `-fpic -shared`
+    - Related to ASLR.
+    - Use only for shared libraries.
+
+### Undefined Behavior Sanitizer (ubsan) Options
+
+ubsan is a run-time checker for different types of undefined behavior.
+
+**TODO** https://developers.redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan
+
+### Debug Options
+
+- Embed debugging info (compiler and linker): `-g`
+- Add compiler flags to debug info: `-grecord-gcc-switches`
+
+## Common Libraries (`-l`)
+
+- C math library (`math.h`): `-lm`
+    - For C++, it's automatically included with the stdlib.
+
+## Miscellaneous Options
+
+- Use piping instead of temporary files during compilation: `-pipe`
+    - Should yield better compilation performance.
+
+{% include footer.md %}