How Cuda is handled on Linux in the institute
Permanent Link:
Overview
Cuda is a framework for utilizing computation capacity of graphics cards and specialized computation hardware supplied by the Nvidia company. This page explains, where which pieces of this framework are available on institute computers and how to use them.
Important:
- Cuda is a moving target. If something doesn't work as expected, please contact IT.
- The commands
CUDA
and getserver
provide --help
functions!
Quick: How to use it?
Step 1: Make sure to be on computer which Cuda is available on
At this point workstations and compute servers running OS generation 7 and 8 are supported. Almost all workstations contain at least a simple Cuda capable graphics card. You can check using this command:
user@host >
CUDA --versions
The command just checks for the respective software packages. To do a functional test, type:
user@host >
CUDA --test
If no Cuda release is available, you have three choices:
- request an upgrade of your workstation's graphics card. Please keep in mind that putting expensive computation acceleration hardware into regular workstations is usually not an option for various reasons. The default graphics card in institute workstations is Nvidia Geforce 1030.
- Use the Slurm compute cluster to request GPU acceleration and run non-interactive jobs there.
- Connect to a dedicated compute server with suitable computation hardware.
Connecting to a compute server works like this:
user@host >
getserver -sL -C x
The
x
means "any cuda version". You might want to set a constraint for a specific one:
user@host >
getserver -sL -C 12.5
Step 2: Enable the Cuda environment
The cuda runtime/development files are enable via an
Environment wrapper. Example:
user@host >
CUDA --version 12.0 myscript.sh
to run a script with Runtime 12.0 enabled or
user@host >
CUDA --version 12.5
to just open a shell with runtime 12.5 enabled.
The CUDA wrapper is guaranteed to fail hard, if the Cuda release is not available. Do not circumvent this method by accessing the respective directories directly!
Runtime vs. Development files
There are two main classes of Cuda components:
- Runtime: These libraries are used to run compiled Cuda enabled programs.
- Development files: These files contain tools (e.g. for debugging) and files required to compile Cuda programs.
Since Cuda is a huge framework, sometimes only the runtime is installed on a given computer.
Different versions
Cuda versions depend hard on driver up-to-dateness to work with the specialized computation hardware. Unfortunately, up-to-date drivers depend on up-to-date hardware and operating system kernel version plus there are subtle other problem caused by license issues.
Digest: The runtime or development files required to run your specific piece of software might not work on a given computer taking it's hardware and operating system into account. Please
contact IT in such cases.
Rules
The institute's software management will enforce these rules:
- By default, the latest Cuda runtime whose constraint are satisfied by a given computer are installed there. One exception:
No Cuda components are installed on Terminal servers (e.g. RemoteLinux ) since no computation should be done there.
- The development files of the latest supported Cuda release are automatically installed on each workstation and compute server.
Explicitely excluded from this rule are compute cluster nodes (because development is done interactively) and terminals servers (because development files depend on a runtime which would allow computation to be done there).
- IT will install additional runtime or development file releases upon request .
- All Cuda components that cannot work on a given computer are automatically removed.
To enable experiments/testing/..., this rule can be disabled on a per-computer basis.