Using Python at the MPI/CBS

Permanent Link: Permanent Link: topicbacklinks

Overview

Python is an easy to understand programing language perfectly suited for research. Myriads of packages (esp. research related ones) are freely available and can be plugged together to form custom computational programs, processing pipelines, create stimuly, etc.

It is important to think about how to structure your environment (which usually consists not just of your scripts but of the installed python packages and probably other resources as well). if you're unsure, use Conda .

Package management methods

The methods are ordered by IT's perceived likeliness of being used used.

1. Virtual python environments using "Conda"

Conda is a tool for managing virtual environments consisting of Python itself and a set of packages. It's a tool that is freely available although the company behind it, has commercial offers as well.

You can do the installation manually. However, the institute provides a simple script ( install-conda ) which will automatically circumvent some problems that could arise. This is how it's being used:

user@host > install-conda

This is why there's an institute specific installer:
  • Conda installs itself to the home directory by default - if not being told otherwise. This a problem because of http://topic.cbs.mpg.de/details#smallhome .
  • Conda activates itself automatically by default - if not being told otherwise. This is a problem since it might prevent you from logging in (esp. to RemoteLinux) in certain situations.

You'll have to open a new terminal once after that.

To use Conda:

user@host > ca

ca is MPI/CBS specific because local IT likes shortcuts. Feel free to use the official command conda activate instead. Refer to the official documentation for more information: https://docs.conda.io/projects/conda/en/stable/user-guide/getting-started.html on how to use Conda.

2. None at all

In the unlikely case that your script is very simple (e.g. simple text processing) and no additional packages are required, you don't need to worry about python packages. You'd just have your file (e.g. myscript.py ) which you call like this:

user@host > python3 myscript.py

To make life easier, create a folder bin in your homedirectory, put the script there, ensure that it's executable:

user@host > mkdir -p ~/bin

user@host > cp myscript.py ~/bin

user@host > chmod u+x ~/bin/myscript.py

If ~/bin didn't exist before, you might have to re-login once. After that, the script can be called anywhere since it became a shell command. Example:

user@host > cd /data/pt_12345

user@host > myscript.py mydatafile.txt

3. Simple platform aware wrapper (CUSTOMPYTHON)

You should not use this method anymore. It will continue to support it but it's usually more convenient for you to stick to Conda .

CUSTOMPYTHON is a script which does these things:
  • Set the python module search path in a way which takes the software platform generation it's being run under into account.
  • Make sure, the path is in your software storage block and not in your home directory.
  • Set the target path for package installation via pip to this path.
  • Provide a wrapping function which enables you to run your custom scripts via a simple call and have them use your installed python packages.

Example use:

Switching to the custom python environment:

user@host > CUSTOMPYTHON

Let's install a package which is not there, yet:

custompython_user@host > pip install scikit-learn

The package should be importable now:

custompython_user@host > python3 -c 'import sklearn'

custompython_user@host >

No error, it works. We exit the environment ...

custompython_user@host > exit

... and try this again:

user@host > python3 -c 'import sklearn'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'sklearn'

This failure is expected since outside of the CUSTOMPYTHON environment, Python doesn't know, where to look for the package. However, any command which is "wrapped" in CUSTOMPYTHON will know:

user@host > CUSTOMPYTHON python3 -c 'import sklearn'

No error, it works. This will not just work for direct python calls but also e.g. for shell scripts which then call Matlab scripts which then call Python scripts.

4. Singularity

OK, this one's pretty hard core. Say you constructed an elaborate pipeline using customized scripts, python packages, etc. To ensure that this pipeline is stable (does the same thing in 10 years it's doing today), the whole environment (basically all the libraries) have to be included in your concept. This is where singularity comes in. The idea is:
  • You select a base operating system (A linux distribution) that is installed into a virtual environment
  • You define, which commands have to be run to prepare your environment.
  • The virtual environment is packaged into an image.
  • The image can be run anywhere - e.g. in the Institute or at the MPCDF.
This kind of deployment is rarely done. If you want to do it, please contact IT - we'll gladly help you.

FAQ

How to quickly make a Conda base environment and install some Python stuff?

  1. Install Conda (see SoftwarePython )
  2. Open a new terminal window.
  3. Make an environment:
    user@host > conda create -n myenv
  4. Change into the new environment:
    user@host > ca myenv
    ( ca is short for conda activate )
  5. Install some fancy package:
    (myenv) user@host > conda install numpy
  6. Enjoy:
    (myenv) user@host > python -c 'import numpy'

This topic: EDV/FuerUser > SoftwarePython
Topic revision: 24 Apr 2025, Burk2
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback