Håvard Ose Nordstrand 4 år sedan
förälder
incheckning
defd6da6a3
2 ändrade filer med 20 tillägg och 7 borttagningar
  1. 19 6
      config/general/computer-testing.md
  2. 1 1
      se/hpc/cuda.md

+ 19 - 6
config/general/computer-testing.md

@@ -6,12 +6,6 @@ breadcrumbs:
 ---
 {% include header.md %}
 
-## Information Gathering
-
-### Linux
-
-- Show CPU vulnerabilities: `tail -n +1 /sys/devices/system/cpu/vulnerabilities/*`
-
 ## CPU
 
 ### Prime95
@@ -74,4 +68,23 @@ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=1m --size=16G --
 - For health testing.
 - See [smartmontools](/config/linux-general/applications/#smartmontools).
 
+## Miscellanea
+
+### Linux
+
+- Show CPU vulnerabilities: `tail -n +1 /sys/devices/system/cpu/vulnerabilities/*`
+- PCIe link speed for device:
+    - Make sure the device is doing something intensive so that the PCIe speed isn't degraded.
+    - Run `sudo lspci -vv`, find the device (e.g. `NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A]`) and look for the `LnkCap` and `LnkSta` lines under "Capabilities".
+    - `LnkCap` is the device capability and `LnkSta` is the current status. Both show the (max and current) PCIe speed/version (speed for _around_ 8 lanes wrt. the specific version) and the number of lanes.
+    - Example `LnkSta` (1): `Speed 16GT/s (ok), Width x16 (ok)`, meaning PCIe 4.0, using 16 lanes.
+    - Example `LnkSta` (2): `Speed 8GT/s (ok), Width x4 (downgraded)`, meaning PCIe 3.0, downgraded to 4 lanes, e.g. if the motherboard doesn't support that many PCIe devices running at full widths.
+    - PCIe speed cheat sheet:
+        - PCIe 1 (2.5GT/s, 250MB/s per lane)
+        - PCIe 2 (5GT/s, 500MB/s per lane)
+        - PCIe 3 (8GT/s, 985MB/s per lane)
+        - PCIe 4 (16GT/s, 1.97GB/s per lane)
+        - PCIe 5 (32GT/s, 3.94GB/s per lane)
+        - PCIe 6 (64GT/s, 7.88GB/s per lane)
+
 {% include footer.md %}

+ 1 - 1
se/hpc/cuda.md

@@ -63,7 +63,7 @@ breadcrumbs:
 - The only memory the host can copy data into or out of.
 - The only memory threads from different blocks can share data in.
 - Statically declared in global scope using the `__device__` declaration or dynamically allocated using `cudaMalloc`.
-- Global memory coalescing: When multiple threads in a warp access global memory, the device will try to _coalesce_ the access into as few transactions as possible in order to mimimize memory load.
+- Global memory coalescing: When multiple threads in a warp access global memory in strided fashion (e.g. when all threads in the warp access sequential parts of an array), the device will try to _coalesce_ the access into as few transactions as possible in order to mimimize memory load.
 
 #### Local Memory