Consistent software provisioning (esp. research applications) on Linux

Overview

Since Linux workstations and compute servers at the institute are centrally managed by IT and installing computers is a highly automated process we can guarantee that all institute computers behave identically. The IT department prioritizes software consistency to ensure reproducibility of research results. Each computer is assigned a software platform generation upon installation, and IT strives to prevent abrupt changes. The supported software for each software platform generation is listed at SoftwareLinux. If you need new software or different versions, please submit a ticket. You are allowed to install software by yourself. This pages wants to shine some light on the details and concepts of software deployment in the institute.

Software platform generations

The operating system is identified by a generation number being seven as of December 29, 2023. Be aware that only the most up-to-date software platform generation is available for newly installed computers. This explicitly includes computers that have to be re-installed because of broken hard disks.

No computation software will simply disappear one day on a given platform. However, a software package might be no longer be supported on the next platform generation. IT minimizes changes to software configurations within a generation, so no computation software will get major upgrades. There are some exceptions, especially for programs with high attack surface:

  • Office programs (e.g. LibreOffice)
  • Web browsers (e.g. Firefox)

These programs are not considered critical when it comes to reproducibility of computation results. Packages are pre-installed with unique IDs and version numbers, which IT strives to keep consistent across all machines within a generation. Updates to critical packages (e.g. an FFT library) are limited to security and bug fixes without introducing new behavior.

Software installed in the network

Permanent Link: topicbacklinks

Installing multiple versions of software on a single computer can be complex, but it's important to ensure users access the correct version without wasting storage. Specific versions are often required for scripted tasks to maintain reproducibility.

Two methods are used for network-installed software:

  1. Packages installed in network locations:
    • Multiple versions are available, accessed via the command line.
      • Enable the software (possibly selecting a version), then use its commands.
      • For Matlab, use MATLAB or MATLAB --version 9.12 to enable, then matlab to run.
      • The matlab command alone runs the latest version.
    • Since this is a research institute, flexibility regarding software installation is necessary. This is what personal storage blocks (e.g. /data/u_someuser_software) are for. Feel free to request one and install software there. These storage blocks are guaranteed to be accessible only by one user.
  2. Containerized software:
    • Software is packaged with its platform in a container to reduce adaptation effort.
    • We use Singularity for that.
    • Access containerized software with the sc command, e.g., sc fsl for different versions, learn more here.

Environments

Permanent Link: topicbacklinks

In Unix it's common to use one tool per task. The institute's environment concept lets you pick tool versions independently from each other, based on SoftwareServiceLinux. Remember: Activate the environment each time you need it.

Example:
  • You have some script which needs Freesurfer, FSL, Ants and Matlab. That's what you do to prepare the scripts work environment:
    user@host > FSL FREESURFER ANTSENV MATLAB
  • A shell is run in which your script should work fine—all the commands of the respective software packages are available.
  • The order in which environents are enabled should not matter.
    • Sometimes, however, there are subtle interdependencies, e.g. Freesurfer depends on FSL. In this case it's a good idea to enable FSL first.
    • Nothing will silently break because of a wrong order. However, Freesurfer might choose a different FSL version than you think and the FSL wrapper later will complain about this fact and return an error.
  • The environment chosen last will show up in your prompt:
    matlab=9.3_user@host >
  • You might want to know the configuration of your work environment:
    matlab=9.3_user@host > cbsenv -L
    fsl 5.0.9
    freesurfer 6.0.0
    ants 2.1.0-rc3
    matlab 9.3
  • You might want to make sure, your script uses the same releases of FSL, Freesurfer, ... all the time:
    user@host > FSL --version 5.0.9 FREESURFER --version 6.0.0 ANTSENV --version 2.1.0-rc3 MATLAB --version 9.3
  • If you need an environment configuration more often, you should assign an alias (in this example e1 ). You could use a "shell alias" but this method is more flexible:
    user@host > mkdir -p ~/bin
    user@host > echo "FSL --version 5.0.9 FREESURFER --version 6.0.0 ANTSENV --version 2.1.0-rc3 MATLAB --version 9.3 \"$@\"" > ~/bin/e1
    user@host > chmod 755 ~/bin/e1
  • You might need to re-login once if the folder ~/bin didn't exist before.
  • Now, the new command e1 is available for you. It's not for other users. The command can be used in two ways:
    1. user@host > e1 to start a interactive shell
    2. user@host > e1 somescript --someparams ... to start a script directly. This syntax can be used in the institute's ComputeClusterSlurm as well.

If you have any questions about the concept, don't hesitate to ask IT .

FAQ

I need the version number of a package on a certain computer in the past. What can I do?

IT stores historical package version records for several years. If you need to know the version number of a package on a certain computer in the past, please write a ticket.

Which generation does computer X belong to?

1. A workstation before login:

For quick identification, the wallpaper of the login screen is identical on all workstations in the same generation.

2. The workstation, you're logged into:

Type this command into a shell:

user@host > distri -g

It will show the numerical SoftwareServiceLinux of the computer.

3. A remote compute server:

When logging into a computer via ssh or getserver -s , the generation number is always shown (look at "OS:"):

I: Connecting via SSH to 'silbermond'
Linux silbermond 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64

  _    This is a Interactive compute server
 (v)   OS       : Generation 7, Kernel: 5.10.0-28
//_\\  Hardware : 376.57 GiB, 32x3600 bMIPS, CPU: Xeon Silver 4108 1.80GHz
(U_U)  Where    : C-20/rack2:35-38
       Services : ComputeLinuxDedicated 
   _
  / \   This server is restarted 7am on the 1st Wednesday each month.
 / ! \  
/_____\ 

It you're already logged in, the distri -g command will work as well.

4. In a script:

It's good practice in programs to test their environment for assumptions. To test, if a script is running on a generation 7 computer, put this line at the beginning:

distri -g | grep -qw 7 || exit 1

What do CAPITALLETTERS+ (e.g. R+) commands do?

These commands are environment wrappers that change the version selection of locally installed packages. Apart from that, they behave identical to other environment wrappers.

Example:
  • R is sometimes installed directly on computers (because other packages depend on it).
  • The R release installed locally is usually very old and you'd have a hard time, installing recent extensions on it.
  • Most researchers in the institute use R+ — an environment and therefore the possibility to either select a specific or more recent version.
  • To run a recent version of R, instead of just R , you'd type R+ R .

sc: How to use Singularity software?

Permanent Link: topicbacklinks

Some software packages are provided as images for software containers". These packages can be used via the sc command.

The physical location of the container repository is /data/p_SoftwareServiceLinux_sc. However, you don't need to know that, and you should not use these containers without the sc command.

To use a software package, you need the ID e.g. fsl. There are some commands that will come in handy:

This will show available versions of a given containerized software package:

user@host > sc fsl

This will start a shell in the the latest available FSL container environment:

user@host > sc fsl latest

This will start a shell in a specific FSL container environment:

user@host > sc fsl 6.0.4

This will start a command in the containerized environment (a version can be given instead of latest ):

user@host > sc fsl latest eddy_cuda9.1 ...

You have access to all storage resources of the institute in a container. Example:

user@host > sc fsl latest ls -la /afs/cbs.mpg.de /data/dt_transfer

To combine multiple software packages easily, there's an extension to the sc method available. Find more information here. With sc a multi-software script would look like this:

#!/bin/bash

# Good scientists abort computations upon unexpected problems
set -e

## We define some variables to not have to type so much per command.
## The easy way:
#fsl="sc fsl latest"
#ants="sc ants latest

## The more scientific way with increased reproducibility:
fsl="sc fsl 6.0.6"
ants="sc ants 2.3.5"

cd /data/pt_12345/data
$fsl eddy_cuda9.1 ...
$ants antsRegistration ...
$fsl some_fsl_command ...
...

A script like that can operate on all /data folders. Each call of the sc command will spawn a small virtual environment, do its job and give control back to the script to run the next command.

SCWRAP: How to combine commands in differente Singularity containers?

Sometimes is easier to be able calling commands in a software package without having to add a prefix like sc fsl latest . This is what SCWRAP is for. Here are die ideas behind it:
  1. Each respective software package is installed in a container image.
  2. Each software package provides command line tools.
  3. The command line tools are duplicated outside the container and wrapped in a way that each invocation will run the respective command in the correct container.

This is how the example script in SoftwareServiceLinux would look like with SCWRAP being used:

#!/bin/bash

# Good scientists abort computations upon unexpected problems
set -e

cd /data/pt_12345/data
eddy_cuda9.1 ...
$ants antsRegistration ...
some_fsl_command ...
...

To run it, wrappers are being used:

user@host > SCWRAP fsl 6.0.6 SCWRAP ants 2.3.5 myscript.sh

Jupyter

Find information about Jupyter use at the institute at SoftwareJupyter .

How to install software by myself?

This is possible and encouraged but there are constraints. IT is very strict about software installation whenever installing it requires administrative permissions. One important reason is that with such permissions changes to a computer's software setup might have subtle influences on other users. However, software installation can be as easy as unpacking a zip file into a folder and running a program in it. When it comes to research related software, this is perfectly fine.

Depending on the context, you want to use the software in, you should choose a good location to install your software. Each user can get a personal storage block to install and customize software in.

Hints/Warnings:
  • Do not install security sensitive software by yourself. This explicitly includes
    • interactive software communicating over the internet (e.g. Web browsers)
    • software parsing complex data types (PDF editors)
  • If the software you want to install is relevant for a bigger group of people, contact IT and ask for it to be installed at a central location. This will safe time and prevent some common problems.

This topic: EDV/FuerUser > WebHome > SoftwareLinux > SoftwareServiceLinux
Topic revision: 06 Jun 2025, Burk2
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback