Consistent software provisioning (esp. research applications) on Linux

Overview

Since Linux workstations and compute servers at the institute are centrally managed by IT and installing computers is a highly automated process we can guarantee that all institute computers behave identically. The IT department prioritizes software consistency to ensure reproducibility of research results. Each computer is assigned a software platform generation upon installation, and IT strives to prevent abrupt changes. The supported software for each software platform generation is listed at SoftwareLinux. If you need new software or different versions, please submit a ticket. You are allowed to install software my yourself. This pages wants to shine some light on the details and concepts of software deployment in the institute.

Software platform generations

The operating system is identified by a generation number being seven as of December 29, 2023. Be aware that only the most up-to-date software platform generation is available for newly installed computers. This explicitly includes computers that have to be re-installed because of broken hard disks.

No computation software will simply disappear one day on a given platform. However, a software package might be no longer be supported on the next platform generation. IT minimizes changes to software configurations within a generation, so no computation software will get major upgrades. There are some exceptions, especially for programs with high attack surface:

  • Office programs (e.g. LibreOffice)
  • Web browsers (e.g. Firefox)

These programs are not considered critical when it comes to reproducibility of computation results. Packages are pre-installed with unique IDs and version numbers, which IT strives to keep consistent across all machines within a generation. Updates to critical packages (e.g. an FFT library) are limited to security and bug fixes without introducing new behavior.

Software installed in the network

Permanent Link: topicbacklinks

Installing multiple versions of software on a single computer can be complex, but it's important to ensure users access the correct version without wasting storage. Specific versions are often required for scripted tasks to maintain reproducibility.

Two methods are used for network-installed software:

  1. Packages installed in network locations:
    • Multiple versions are available, accessed via the command line.
      • Enable the software (possibly selecting a version), then use its commands.
      • For Matlab, use MATLAB or MATLAB --version 9.12 to enable, then matlab to run.
      • The matlab command alone runs the latest version.
    • Since this is a research institute, flexibility regarding software installation is necessary. This is what personal storage blocks (e.g. /data/u_someuser_software) are for. Feel free to request one and install software there. These storage blocks are guaranteed to be accessible only by one user.
  2. Containerized software:
    • Software is packaged with its platform in a container to reduce adaptation effort.
    • We use Singularity for that.
    • Access containerized software with the sc command, e.g., sc fsl for different versions, learn more here.

Environments

Permanent Link: topicbacklinks

In Unix it's common to use one tool per task. The institute's environment concept lets you pick tool versions independently from each other, based on SoftwareServiceLinux. Remember: Activate the environment each time you need it.

Example:
  • You have some script which needs Freesurfer, FSL, Ants and Matlab. That's what you do to prepare the scripts work environment:
    user@host > FSL FREESURFER ANTSENV MATLAB
  • A shell is run in which your script should work fine—all the commands of the respective software packages are available.
  • The order in which environents are enabled should not matter.
    • Sometimes, however, there are subtle interdependencies, e.g. Freesurfer depends on FSL. In this case it's a good idea to enable FSL first.
    • Nothing will silently break because of a wrong order. However, Freesurfer might choose a different FSL version than you think and the FSL wrapper later will complain about this fact and return an error.
  • The environment chosen last will show up in your prompt:
    matlab=9.3_user@host >
  • You might want to know the configuration of your work environment:
    matlab=9.3_user@host > cbsenv -L
    fsl 5.0.9
    freesurfer 6.0.0
    ants 2.1.0-rc3
    matlab 9.3
  • You might want to make sure, your script uses the same releases of FSL, Freesurfer, ... all the time:
    user@host > FSL --version 5.0.9 FREESURFER --version 6.0.0 ANTSENV --version 2.1.0-rc3 MATLAB --version 9.3
  • If you need an environment configuration more often, you should assign an alias (in this example e1 ). You could use a "shell alias" but this method is more flexible:
    user@host > mkdir -p ~/bin
    user@host > echo "FSL --version 5.0.9 FREESURFER --version 6.0.0 ANTSENV --version 2.1.0-rc3 MATLAB --version 9.3 \"$@\"" > ~/bin/e1
    user@host > chmod 755 ~/bin/e1
  • You might need to re-login once if the folder ~/bin didn't exist before.
  • Now, the new command e1 is available for you. It's not for other users. The command can be used in two ways:
    1. user@host > e1 to start a interactive shell
    2. user@host > e1 somescript --someparams ... to start a script directly. This syntax can be used in the institute's ComputeClusterSlurm as well.

If you have any questions about the concept, don't hesitate to ask IT .

FAQ

I need the version number of a package on a certain computer in the past. What can I do?

IT stores historical package version records for several years. If you need to know the version number of a package on a certain computer in the past, please write a ticket.

Which generation does computer X belong to?

1. A workstation before login:

For quick identification, the wallpaper of the login screen is identical on all workstations in the same generation.

2. The workstation, you're logged into:

Type this command into a shell:

user@host > distri -g

It will show the numerical SoftwareServiceLinux of the computer.

3. A remote compute server:

When logging into a computer via ssh or getserver -s , the generation number is always shown (look at "OS:"):

I: Connecting via SSH to 'silbermond'
Linux silbermond 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64

  _    This is a Interactive compute server
 (v)   OS       : Generation 7, Kernel: 5.10.0-28
//_\\  Hardware : 376.57 GiB, 32x3600 bMIPS, CPU: Xeon Silver 4108 1.80GHz
(U_U)  Where    : C-20/rack2:35-38
       Services : ComputeLinuxDedicated 
   _
  / \   This server is restarted 7am on the 1st Wednesday each month.
 / ! \  
/_____\ 

It you're already logged in, the distri -g command will work as well.

4. In a script:

It's good practice in programs to test their environment for assumptions. To test, if a script is running on a generation 7 computer, put this line at the beginning:

distri -g | grep -qw 7 || exit 1

What do CAPITALLETTERS+ (e.g. R+) commands do?

These commands are environment wrappers that change the version selection of locally installed packages. Apart from that, they behave identical to other environment wrappers.

Example:
  • R is sometimes installed directly on computers (because other packages depend on it).
  • The R release installed locally is usually very old and you'd have a hard time, installing recent extensions on it.
  • Most researchers in the institute use R+ — an environment and therefore the possibility to either select a specific or more recent version.
  • To run a recent version of R, instead of just R , you'd type R+ R .

How to use Jupyter Notebook?

Permanent Link: topicbacklinks

It's usually easiest to run Jupyter Notebook on the computer your interactive session runs on.

The reason:
  1. Jupyter Notebook is a little Webserver that calls a Web browser (default is Firefox) to render the Notebook.
  2. Firefox can only run once per Account in the whole institute.
  3. Your browser usually already runs on the computer, your graphical session runs on.

There's a solution. Unfortunately, there's no easy way to automate it:
  1. Use the browser in your graphical session
  2. Make a tunnel to connect to Jupyter's Little Webserver

user@mycomputer> getserver -sL

Given that we're connected to someserver now:

user@someserver> JUPYTER jupyter notebook

A URL like this will be shown:

http://localhost:8890/?token=b5e109e9c284ae95daac704e14ca6ba21779a20ab1dca32b

Remember the port number ( 8890 in this case).

Back on your computer (in another terminal) type this command to establish a web tunnel:

user@mycomputer> ssh -L 8890:localhost:8890 someserver

and open the URL from the other terminal in your browser. Both terminals can be minimized after that. BTW: This tunnel solution can be applied to custom Jupyter installations as well.

How to use Singularity software?

Permanent Link: topicbacklinks

Some software packages are provided as software containers". These packages can be used via the sc command. Find a list of these packages and their IDs at SoftwareLinux .

The physical location of the container repository is /data/p_SoftwareServiceLinux_sc. However, you don't need to know that, and you should not use these containers without the sc command.

To use a software package, you need the ID e.g. fsl. There are some commands that will come in handy:

This will show available versions of a given containerized software package:

user@host > sc fsl

This will start a shell in the the latest available FSL container environment:

user@host > sc fsl latest

This will start a shell in a specific FSL container environment:

user@host > sc fsl 6.0.4

This will start a command in the containerized environment (a version can be given instead of latest ):

user@host > sc fsl latest eddy_cuda9.1 ...

You have access to all storage resources of the institute in a container. Example:

user@host > sc fsl latest ls -la /afs/cbs.mpg.de /data/dt_transfer

How to install software by myself?

IT is very strict about software installation whenever installing it requires administrative permissions. One important reason is that with administrative permissions changes to a computer's software setup might have subtle influences on other users. However, software installation can be as easy as unpacking a zip file into a folder and running a program in it. When it comes to research related software, this is perfectly fine.

Depending on the context, you want to use the software in, you should choose a good location to install your software. Each user can get a personal storage block to install and customize software in. Request it at here.

Hints/Warnings:
  • Do not install security sensitive software by yourself. This explicitly includes
    • interactive software communicating over the internet (e.g. Web browsers)
    • software parsing complex data types (PDF editors)
  • If the software you want to install is relevant for a bigger group of people, contact IT and ask for it to be installed by IT. This will safe time and prevent problems related to operating system updates (which are likely in self-installed software).

This topic: EDV/FuerUser > WebHome > SoftwareLinux > SoftwareServiceLinux
Topic revision: 11 Sep 2024, Burk2
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback