jeremy/HON95-wiki @ c70e4ad37f24085f11c52a814e474e2a92463a2e - Gogs

title: CUDA breadcrumbs:

title: Configuration
title: High-Performance Computing (HPC) --- {% include header.md %}

NVIDIA CUDA (Compute Unified Device Architecture) Toolkit, for programming CUDA-capable GPUs.

Related Pages

{:.no_toc}

CUDA (software engineering)

Resources

Setup

Linux

The toolkit on Linux can be installed in different ways:

Through an an existing package in your distro's repos (simplest and most compatible with other packages, but may be outdated).
Through a downloaded package manager package (up to date but may be incompatible with your installed NVIDIA driver).
Through a runfile (same as previous but more cross-distro and harder to manage).

If an NVIDIA driver is already installed, it must match the CUDA version.

Downloads: CUDA Toolkit Download (NVIDIA)

Ubuntu (NVIDIA CUDA Repo)

Follow the steps to add the NVIDIA CUDA repo: CUDA Toolkit Download (NVIDIA)
- But don't install cuda yet.
Remove anything NVIDIA or CUDA from the system to avoid conflicts: apt purge --autoremove cuda nvidia-* libnvidia-*
- Warning: May break your PC. There may be better ways to do this.
Install CUDA from the new repo (includes the NVIDIA driver): apt install cuda
Setup path: In /etc/environment, append :/usr/local/cuda/bin to the end of the PATH list.

Docker Containers

Docker containers may run NVIDIA applications using the NVIDIA runtime for Docker.
TODO

DCGM

For monitoring GPU hardware and performance.
See the DCGM exporter for Prometheus for monitoring NVIDIA GPUs from Prometheus.

Programming

See CUDA (software engineering).

Usage and Tools

Gathering system/GPU information with nvidia-smi:
- Show overview: nvidia-smi
- Show topology matrix: nvidia-smi topo --matrix
- Show topology info: nvidia-smi topo <option>
- Show NVLink info: nvidia-smi nvlink --status -i 0 (for GPU #0)
- Monitor device stats: nvidia-smi dmon
To specify which devices are available to the CUDA application and in which order, set the CUDA_VISIBLE_DEVICES env var to a comma-separated list of device IDs.

{% include footer.md %}