Using Python at the MPI/CBS
Permanent Link: Permanent Link:
Overview
Python is an easy to understand programing language, perfectly suited for research. Myriads of packages (esp. research related ones) are freely available and can be plugged together to form custom computational programs, processing pipelines, to create stimuli, etc.
It is important to think about how to structure your environment that usually not only consists of your scripts, but also of the installed Python packages and probably other resources, as well. If you are unsure, use
Package management methods
The methods are ordered by IT's perceived likeliness of being used.
1. Virtual python environments using "Conda"
Permanent Link:
Conda is a tool for managing virtual environments consisting of Python itself and a set of packages. It is a tool that is freely available, although the company behind it has commercial offers, as well.
Official Anaconda/Miniconda/... installers are known to cause problems. The institute provided installer avoids those. Known problems:
- The official AnaConda repository is blocked at the institute. The installer will make sure to configure conda accordingly.
- Conda tends to cause problems in shell configuration files. Find out more here.
- Conda uses your home directory by default which is incompatible with the institute because of its storage requirements.
Install Conda in the institute via this command:
user@host >
install-conda
Hints:
- If you do not have yet a "software storage block", get one here.
- You will have to open a new terminal once after that. In all terminals opened after that, the shortcut
ca
can be used to call conda activate
.
-
ca
is MPI/CBS specific because local IT likes shortcuts.
- This is why there is an institute specific installer:
- Conda installs itself in the home directory by default, if not being told otherwise. This a problem because of http://topic.cbs.mpg.de/details#smallhome .
- Conda activates itself automatically by default, if not being told otherwise. This is a problem since it might prevent you from logging in (esp. to RemoteLinux) in certain situations.
- Conda can be integrated into Pycharm easily.
- To use Conda at home or on your laptop, find instructions and the required download here: https://github.com/conda-forge/miniforge
- Refer to the official documentation for more information: https://docs.conda.io/projects/conda/en/stable/user-guide/getting-started.html on how to use Conda.
Useful commands
Command |
Explanation |
conda create -n exp |
Creates a new conda environment called exp |
ca exp |
Switch to the exp environment |
conda install notebook |
Installs Jupyter Notebook. Run it within the respective Conda environment via user@host > jupyter-notebook |
=conda search '*numpy*' |
Substring search in the packages available within the Conda ecosystem. |
2. None at all
In the unlikely case that your script is very simple (e.g. simple text processing) and no additional packages are required, you don't need to worry about python packages. You would just have your file (e.g.
myscript.py
), which you call like this:
user@host >
python3 myscript.py
To make life easier, create a folder
bin
in your home directory, put the script there, and ensure that it is executable:
user@host >
mkdir -p ~/bin
user@host >
cp myscript.py ~/bin
user@host >
chmod u+x ~/bin/myscript.py
If
~/bin
did not exist before, you might have to re-login once. After that, the script can be called anywhere since it became a shell command. Example:
user@host >
cd /data/pt_12345
user@host >
myscript.py mydatafile.txt
3. Singularity
OK, this one is pretty hard core. Say you constructed an elaborate pipeline using customized scripts, python packages, etc. To ensure that this pipeline does the same thing in 10 years, the whole environment with all the libraries has to be included in your concept. This is where Singularity comes in. The idea is:
- You select a base operating system, e.g. a Linux distribution, that is installed into a virtual environment.
- You define, which commands have to be run to prepare your environment.
- The virtual environment is packaged into an image.
- The image can be run anywhere - e.g. in the Institute or at the MPCDF.
This kind of deployment is rarely done. If you want to do it, please contact IT, we will gladly help you.
FAQ
How to quickly make a Conda base environment and install some Python stuff?
- Install Conda (see SoftwarePython )
- Open a new terminal window.
- Make an environment:
user@host >
conda create -n myenv
- Change into the new environment:
user@host >
ca myenv
( ca
is short for conda activate
)
- Install some fancy package:
(myenv) user@host >
conda install numpy
- Enjoy:
(myenv) user@host >
python -c 'import numpy'
How can I use environment files to create Conda environments?
You do not have to type all your commands by hand every time you create a Conda environment. You can create an environment file, which you can conveniently share with your colleagues. See a simple example below:
name: spyder_env
channels:
- conda-forge
dependencies:
- python==3.10
- spyder
Create the environment by entering
conda env create -f PATH_TO_FILE
and activate it with
conda activate spyder_env
See
this page for further reading.
How to use a GUI (e.g. Spyder) together with conda
Basic idea:
- Start the respective environment in a terminal - e.g. via
user@host >
ca myenv
- Find out the interpreter (python binary) of the Conda environment. Example:
(myenv) user@host >
which python
/data/u_user_software/conda/envs/myenv/bin/python
Copy this file path for later.
- Start the GUI inside the envionment - e.g.
(myenv) user@host >
spyder
- Make sure, the GUI uses the correct interpreter.
- For Spyder, follow these steps:
- Select "Tools"/"Preferences", ...
- ... go to section "Python interpreter" ...
- ... select "Use the following Python interpreter" ...
- ... and paste the full path of Conda environment's interpreter there.
- Spyder might complain about missing "spyder-kernels". Just copy'n paste the suggested Conda command into a terminal with your Conda environment open. Example:
- Open new terminal
-
user@host >
ca myenv
(myenv) user@host >
conda install spyder-kernels=2.4
- The exact version of the package will depend on the Spyder release.
I am using CUSTOMPYTHON, how does it work? (Warning: CUSTOMPYTHON is not recommended anymore)
You should not use this method anymore. IT will continue to support it, but it is usually more convenient for you to stick to
Conda .
CUSTOMPYTHON is a script which does these things:
- Set the Python module search path in a way which takes the software platform generation it is being run under into account.
- Make sure that the path is in your software storage block and not in your home directory.
- Set the target path for package installation via
pip
to this path.
- Provide a wrapping function which enables you to run your custom scripts via a simple call and have them use your installed Python packages.
Example use:
Switching to the custom python environment:
user@host >
CUSTOMPYTHON
Let's install a package, which is not there, yet:
custompython_user@host >
pip install scikit-learn
The package should be importable now:
custompython_user@host >
python3 -c 'import sklearn'
custompython_user@host >
No error, it works. We exit the environment ...
custompython_user@host >
exit
... and try this again:
user@host >
python3 -c 'import sklearn'
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'sklearn'
This failure is expected since outside of the CUSTOMPYTHON environment, Python does not know where to look for the package. However, any command which is "wrapped" in CUSTOMPYTHON will know:
user@host >
CUSTOMPYTHON python3 -c 'import sklearn'
No error, it works. This works not only for direct Python calls, but also for shell scripts that call Matlab scripts that call Python scripts, for example.