Consistent software provisioning (esp. research applications) on Linux
Overview
Since Linux workstations and compute servers at the institute are centrally managed by IT and installing computers is a highly automated process we can guarantee that all institute computers behave identically. The IT department prioritizes software consistency to ensure reproducibility of research results. Each computer is assigned a software platform generation upon installation, and IT strives to prevent abrupt changes. The supported software for each software platform generation is listed at
SoftwareLinux. If you need new software or different versions, please
submit a ticket. You are allowed to install software
my yourself. This pages wants to shine some light on the details and concepts of software deployment in the institute.
The operating system is identified by a generation number being seven as of December 29, 2023. Be aware that only the most up-to-date software platform generation is available for newly installed computers. This explicitly includes computers that have to be re-installed because of broken hard disks.
No computation software will simply disappear one day on a given platform. However, a software package might be no longer be supported on the next platform generation. IT minimizes changes to software configurations within a generation, so no computation software will get major upgrades. There are some exceptions, especially for programs with high attack surface:
- Office programs (e.g. LibreOffice)
- Web browsers (e.g. Firefox)
These programs are not considered critical when it comes to reproducibility of computation results. Packages are pre-installed with unique IDs and version numbers, which IT strives to keep consistent across all machines within a generation. Updates to critical packages (e.g. an FFT library) are limited to security and bug fixes without introducing new behavior.
Software installed in the network
Permanent Link:
Installing multiple versions of software on a single computer can be complex, but it's important to ensure users access the correct version without wasting storage. Specific versions are often required for scripted tasks to maintain reproducibility.
Two methods are used for network-installed software:
- Packages installed in network locations:
- Multiple versions are available, accessed via the command line.
- Enable the software (possibly selecting a version), then use its commands.
- For Matlab, use
MATLAB
or MATLAB --version 9.12
to enable, then matlab
to run.
- The
matlab
command alone runs the latest version.
- Since this is a research institute, flexibility regarding software installation is necessary. This is what personal storage blocks (e.g. /data/u_someuser_software) are for. Feel free to request one and install software there. These storage blocks are guaranteed to be accessible only by one user.
- Containerized software:
- Software is packaged with its platform in a container to reduce adaptation effort.
- We use Singularity for that.
- Access containerized software with the
sc
command, e.g., sc fsl
for different versions, learn more here.
Environments
Permanent Link:
In Unix it's common to use one tool per task. The institute's environment concept lets you pick tool versions independently from each other, based on
SoftwareServiceLinux.
Remember: Activate the environment each time you need it.
Example:
If you have any questions about the concept, don't hesitate to
ask IT .
FAQ
I need the version number of a package on a certain computer in the past. What can I do?
IT stores historical package version records for several years. If you need to know the version number of a package on a certain computer in the past, please
write a ticket.
Which generation does computer X belong to?
1. A workstation before login:
For quick identification, the wallpaper of the login screen is identical on all workstations in the same generation.
2. The workstation, you're logged into:
Type this command into a shell:
user@host >
distri -g
It will show the numerical
SoftwareServiceLinux of the computer.
3. A remote compute server:
When logging into a computer via
ssh
or
getserver -s
, the generation number is always shown (look at "OS:"):
I: Connecting via SSH to 'silbermond'
Linux silbermond 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64
_ This is a Interactive compute server
(v) OS : Generation 7, Kernel: 5.10.0-28
//_\\ Hardware : 376.57 GiB, 32x3600 bMIPS, CPU: Xeon Silver 4108 1.80GHz
(U_U) Where : C-20/rack2:35-38
Services : ComputeLinuxDedicated
_
/ \ This server is restarted 7am on the 1st Wednesday each month.
/ ! \
/_____\
It you're already logged in, the
distri -g
command will work as well.
4. In a script:
It's good practice in programs to test their environment for assumptions. To test, if a script is running on a generation 7 computer, put this line at the beginning:
distri -g | grep -qw 7 || exit 1
What do CAPITALLETTERS+ (e.g. R+) commands do?
These commands are
environment wrappers that change the version selection of locally installed packages. Apart from that, they behave identical to other environment wrappers.
Example:
- R is sometimes installed directly on computers (because other packages depend on it).
- The R release installed locally is usually very old and you'd have a hard time, installing recent extensions on it.
- Most researchers in the institute use
R+
— an environment and therefore the possibility to either select a specific or more recent version.
- To run a recent version of R, instead of just
R
, you'd type R+ R
.
How to use Jupyter Notebook?
Permanent Link:
It's usually easiest to run Jupyter Notebook on the computer your interactive session runs on.
The reason:
- Jupyter Notebook is a little Webserver that calls a Web browser (default is Firefox) to render the Notebook.
- Firefox can only run once per Account in the whole institute.
- Your browser usually already runs on the computer, your graphical session runs on.
There's a solution. Unfortunately, there's no easy way to automate it:
- Use the browser in your graphical session
- Make a tunnel to connect to Jupyter's Little Webserver
user@mycomputer>
getserver -sL
Given that we're connected to
someserver
now:
user@someserver>
JUPYTER jupyter notebook
A URL like this will be shown:
http://localhost:8890/?token=b5e109e9c284ae95daac704e14ca6ba21779a20ab1dca32b
Remember the port number (
8890
in this case).
Back on your computer (in another terminal) type this command to establish a web tunnel:
user@mycomputer>
ssh -L 8890:localhost:8890 someserver
and open the URL from the other terminal in your browser. Both terminals can be minimized after that. BTW: This tunnel solution can be applied to custom Jupyter installations as well.
How to use Singularity software?
Permanent Link:
Some software packages are provided as
software containers". These packages can be used via the
sc
command. Find a list of these packages and their IDs at
SoftwareLinux .
The physical location of the container repository is
/data/p_SoftwareServiceLinux_sc
. However, you don't need to know that, and you should not use these containers without the
sc
command.
To use a software package, you need the ID e.g.
fsl
. There are some commands that will come in handy:
This will show available versions of a given containerized software package:
user@host >
sc fsl
This will start a shell in the the latest available FSL container environment:
user@host >
sc fsl latest
This will start a shell in a specific FSL container environment:
user@host >
sc fsl 6.0.4
This will start a command in the containerized environment (a version can be given instead of
latest
):
user@host >
sc fsl latest eddy_cuda9.1 ...
You have access to all storage resources of the institute in a container. Example:
user@host >
sc fsl latest ls -la /afs/cbs.mpg.de /data/dt_transfer
How to install software by myself?
IT is very strict about software installation whenever installing it requires administrative permissions. One important reason is that with administrative permissions changes to a computer's software setup might have subtle influences on other users. However, software installation can be as easy as unpacking a zip file into a folder and running a program in it. When it comes to research related software, this is perfectly fine.
Depending on the context, you want to use the software in, you should choose a good location to install your software. Each user can get a personal storage block to install and customize software in. Request it at
here.
Hints/Warnings:
- Do not install security sensitive software by yourself. This explicitly includes
- interactive software communicating over the internet (e.g. Web browsers)
- software parsing complex data types (PDF editors)
- If the software you want to install is relevant for a bigger group of people, contact IT and ask for it to be installed by IT. This will safe time and prevent problems related to operating system updates (which are likely in self-installed software).