How Cuda is handled on Linux in the institute

Permanent Link:

Overview
Quick: How to use it?
- Step 1: Make sure to be on computer which Cuda is available on
- Step 2: Enable the Cuda environment
Runtime vs. Development files
Different versions
Rules

Overview

Cuda is a framework for utilizing computation capacity of graphics cards and specialized computation hardware supplied by the Nvidia company. This page explains, where which pieces of this framework are available on institute computers and how to use them.

Important:

Cuda is a moving target. If something doesn't work as expected, please contact IT.
The commands CUDA and getserver provide --help functions!

Quick: How to use it?

Step 1: Make sure to be on computer which Cuda is available on

At this point workstations and compute servers running OS generation 7 and 8 are supported. Almost all workstations contain at least a simple Cuda capable graphics card. You can check using this command:

user@host > CUDA --versions

The command just checks for the respective software packages. To do a functional test, type:

user@host > CUDA --test

If no Cuda release is available, you have three choices:

request an upgrade of your workstation's graphics card. Please keep in mind that putting expensive computation acceleration hardware into regular workstations is usually not an option for various reasons. The default graphics card in institute workstations is Nvidia Geforce 1030.
Use the Slurm compute cluster to request GPU acceleration and run non-interactive jobs there.
Connect to a dedicated compute server with suitable computation hardware.

Connecting to a compute server works like this:

user@host > getserver -sL -C x

The x means "any cuda version". You might want to set a constraint for a specific one:

user@host > getserver -sL -C 12.5

Step 2: Enable the Cuda environment

The cuda runtime/development files are enable via an Environment wrapper. Example:

user@host > CUDA --version 12.0 myscript.sh

to run a script with Runtime 12.0 enabled or

user@host > CUDA --version 12.5

to just open a shell with runtime 12.5 enabled.

The CUDA wrapper is guaranteed to fail hard, if the Cuda release is not available. Do not circumvent this method by accessing the respective directories directly!

Runtime vs. Development files

There are two main classes of Cuda components:

Runtime: These libraries are used to run compiled Cuda enabled programs.
Development files: These files contain tools (e.g. for debugging) and files required to compile Cuda programs.

Since Cuda is a huge framework, sometimes only the runtime is installed on a given computer.

Different versions

Cuda versions depend hard on driver up-to-dateness to work with the specialized computation hardware. Unfortunately, up-to-date drivers depend on up-to-date hardware and operating system kernel version plus there are subtle other problem caused by license issues.

Digest: The runtime or development files required to run your specific piece of software might not work on a given computer taking it's hardware and operating system into account. Please contact IT in such cases.

Rules

The institute's software management will enforce these rules:

By default, the latest Cuda runtime whose constraint are satisfied by a given computer are installed there. One exception:
No Cuda components are installed on Terminal servers (e.g. RemoteLinux ) since no computation should be done there.
The development files of the latest supported Cuda release are automatically installed on each workstation and compute server.
Explicitely excluded from this rule are compute cluster nodes (because development is done interactively) and terminals servers (because development files depend on a runtime which would allow computation to be done there).
IT will install additional runtime or development file releases upon request .
All Cuda components that cannot work on a given computer are automatically removed.
To enable experiments/testing/..., this rule can be disabled on a per-computer basis.

This topic: EDV/FuerUser > WebHome > SoftwareLinux > CudaConcept
Topic revision: 05 Aug 2024, Wherbst

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback