Using Python at the MPI/CBS
Permanent Link: Permanent Link:
Overview
Python is an easy to understand programing language perfectly suited for research. Myriads of packages (esp. research related ones) are freely available and can be plugged together to form custom computational programs, processing pipelines, create stimuly, etc.
It is important to think about how to structure your environment (which usually consists not just of your scripts but of the installed python packages and probably other resources as well). if you're unsure, use
Conda .
Package management methods
The methods are ordered by IT's perceived likeliness of being used used.
1. Virtual python environments using "Conda"
Conda is a tool for managing virtual environments consisting of Python itself and a set of packages. It's a tool that is freely available although the company behind it, has commercial offers as well.
You can do the installation manually. However, the institute provides a simple script (
install-conda
) which will automatically circumvent some problems that could arise. This is how it's being used:
user@host >
install-conda
This is why there's an institute specific installer:
- Conda installs itself to the home directory by default - if not being told otherwise. This a problem because of http://topic.cbs.mpg.de/details#smallhome .
- Conda activates itself automatically by default - if not being told otherwise. This is a problem since it might prevent you from logging in (esp. to RemoteLinux) in certain situations.
You'll have to open a new terminal once after that.
To use Conda:
user@host >
ca
ca
is MPI/CBS specific because local IT likes shortcuts. Feel free to use the official command
conda activate
instead. Refer to the official documentation for more information:
https://docs.conda.io/projects/conda/en/stable/user-guide/getting-started.html on how to use Conda.
2. None at all
In the unlikely case that your script is very simple (e.g. simple text processing) and no additional packages are required, you don't need to worry about python packages. You'd just have your file (e.g.
myscript.py
) which you call like this:
user@host >
python3 myscript.py
To make life easier, create a folder
bin
in your homedirectory, put the script there, ensure that it's executable:
user@host >
mkdir -p ~/bin
user@host >
cp myscript.py ~/bin
user@host >
chmod u+x ~/bin/myscript.py
If
~/bin
didn't exist before, you might have to re-login once. After that, the script can be called anywhere since it became a shell command. Example:
user@host >
cd /data/pt_12345
user@host >
myscript.py mydatafile.txt
You should not use this method anymore. It will continue to support it but it's usually more convenient for you to stick to
Conda .
CUSTOMPYTHON is a script which does these things:
- Set the python module search path in a way which takes the software platform generation it's being run under into account.
- Make sure, the path is in your software storage block and not in your home directory.
- Set the target path for package installation via
pip
to this path.
- Provide a wrapping function which enables you to run your custom scripts via a simple call and have them use your installed python packages.
Example use:
Switching to the custom python environment:
user@host >
CUSTOMPYTHON
Let's install a package which is not there, yet:
custompython_user@host >
pip install scikit-learn
The package should be importable now:
custompython_user@host >
python3 -c 'import sklearn'
custompython_user@host >
No error, it works. We exit the environment ...
custompython_user@host >
exit
... and try this again:
user@host >
python3 -c 'import sklearn'
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'sklearn'
This failure is expected since outside of the CUSTOMPYTHON environment, Python doesn't know, where to look for the package. However, any command which is "wrapped" in CUSTOMPYTHON will know:
user@host >
CUSTOMPYTHON python3 -c 'import sklearn'
No error, it works. This will not just work for direct python calls but also e.g. for shell scripts which then call Matlab scripts which then call Python scripts.
4. Singularity
OK, this one's pretty hard core. Say you constructed an elaborate pipeline using customized scripts, python packages, etc. To ensure that this pipeline is stable (does the same thing in 10 years it's doing today), the whole environment (basically all the libraries) have to be included in your concept. This is where singularity comes in. The idea is:
- You select a base operating system (A linux distribution) that is installed into a virtual environment
- You define, which commands have to be run to prepare your environment.
- The virtual environment is packaged into an image.
- The image can be run anywhere - e.g. in the Institute or at the MPCDF.
This kind of deployment is rarely done. If you want to do it, please contact IT - we'll gladly help you.
FAQ
How to quickly make a Conda base environment and install some Python stuff?
- Install Conda (see SoftwarePython )
- Open a new terminal window.
- Make an environment:
user@host >
conda create -n myenv
- Change into the new environment:
user@host >
ca myenv
( ca
is short for conda activate
)
- Install some fancy package:
(myenv) user@host >
conda install numpy
- Enjoy:
(myenv) user@host >
python -c 'import numpy'