This article explains how to install mpi4py in your home directory on the cluster in such a way that it is possible to use different clusters and/or compilers and MPI libraries
The Problem
mpi4py needs to be built for a specific combination of the following:
- Compiler
- MPI flavour
- CPU type
- Interconnect
- Python version
This means that if installed with pip install --user mpi4py
it will be tied to the exact combination of the aforementioned requirements used at install time.
In SCITAS clusters there is an environment variable set that captures the CPU type and the Interconnect: $SYS_TYPE
.
The solution to this is to use the Python virtualenv package which allows multiple python environments to co-exist.
Step-by-step guide
Make a directory for all your virtualenv projects (optional)
$ cd $ mkdir virtualenv
Decide on your compiler/MPI and know where you are
For each combination we need to create a directory and so we choose the following naming convention:
$SYS_TYPE _ compiler _ MPI
e.g.
$ cd virtualenv $ mkdir ${SYS_TYPE}_gcc_mvapich $ ls x86_E5v4_Mellanox_gcc_mvapich
Note that for Intel there is only one MPI available so ${SYS_TYPE}_intel
will suffice as a name.
${SYS_TYPE}
identifies the hardware type and looks something like x86_E5v4_Mellanox
Optionally, if one intends to use multiple Python versions you might want to include the version of Python, for example, you can append: _py3
.
Run virtualenv and install mpi4py
First check that the correct modules have been loaded!
$ module load gcc mvapich2 python $ module list Currently Loaded Modules: 1) gcc/7.4.0 2) mvapich2/2.3.1 3) python/3.7.3
Now create the virtualenv by pointing to use the appropriate directory and python version (2 or 3)
$ virtualenv -p python3 --system-site-packages virtualenv/${SYS_TYPE}_gcc_mvapich Running virtualenv with interpreter /ssoft/spack/humagne/v1/opt/spack/linux-rhel7-x86_E5v4_Mellanox/gcc-7.4.0/python-3.7.3-5lm3vikrg4nq4tjhx76dgqy7zbt4kfam/bin/python3 Using base prefix '/ssoft/spack/humagne/v1/opt/spack/linux-rhel7-x86_E5v4_Mellanox/gcc-7.4.0/python-3.7.3-5lm3vikrg4nq4tjhx76dgqy7zbt4kfam' New python executable in /home/user/virtualenv/x86_E5v4_Mellanox_gcc_mvapich/bin/python3 Also creating executable in /home/user/virtualenv/x86_E5v4_Mellanox_gcc_mvapich/bin/python Installing setuptools, pip, wheel... done
And change to the newly created virtual environment
$ source virtualenv/${SYS_TYPE}_gcc_mvapich/bin/activate (x86_E5v4_Mellanox_gcc_mvapich) [user@cluster ]$
Note that the prompt is changed as a reminder that the virtualenv is active.
Now we can install mpi4py using the "--no-cache-dir
" option to make sure that it always get rebuilt correctly
(x86_E5v4_Mellanox_gcc_mvapich) [user@cluster]$ pip install --no-cache-dir mpi4py Collecting mpi4py Downloading mpi4py-3.0.3.tar.gz (1.4 MB) (...) Successfully built mpi4py Installing collected packages: mpi4py Successfully installed mpi4py-3.0.3
To leave the virtual environment simply type deactivate
.
(x86_E5v4_Mellanox_gcc_mvapich) [user@cluster ~]$ deactivate [user@cluster ~]$
Installing for another combination
We can repeat the above process for as many permutations as we need:
$ mkdir virtualenv/${SYS_TYPE}_intel $ module purge $ module load intel intel-mpi python $ module list Currently Loaded Modules: 1) intel/18.0.5 2) intel-mpi/2018.4.274 3) python/3.7.3 $ virtualenv -p python3 --system-site-packages virtualenv/${SYS_TYPE}_intel Running virtualenv with interpreter /ssoft/spack/humagne/v1/opt/spack/linux-rhel7-x86_E5v4_Mellanox/intel-18.0.5/python-3.7.3-t6azvyfvc6hq72fnyfzvereqk54ng4xk/bin/python3 Using base prefix '/ssoft/spack/humagne/v1/opt/spack/linux-rhel7-x86_E5v4_Mellanox/intel-18.0.5/python-3.7.3-t6azvyfvc6hq72fnyfzvereqk54ng4xk' New python executable in /home/rmsilva/virtualenv/x86_E5v4_Mellanox_intel/bin/python3 Also creating executable in /home/rmsilva/virtualenv/x86_E5v4_Mellanox_intel/bin/python Installing setuptools, pip, wheel... done. $ source virtualenv/${SYS_TYPE}_intel/bin/activate (x86_E5v4_Mellanox_intel) [user@cluster]$ pip install --no-cache-dir mpi4py Collecting mpi4py Installing collected packages: mpi4py Successfully installed mpi4py-3.0.3
Using mpi4py
When you want to use mpi4py you now need to load the appropriate virtual environment as well as loading the corresponding modules:
$ module load intel intel-mpi python $ source virtualenv/${SYS_TYPE}_intel/bin/activate (x86_E5v2_IntelIB_intel) [eroche@deneb2 ~]$ python Python 3.6.1 (default, Aug 19 2017, 20:39:41) [GCC Intel(R) C++ gcc 4.8.5 mode] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from mpi4py import MPI >>> comm = MPI.COMM_WORLD >>> print("Hello! I'm rank %d from %d running in total..." % (comm.rank, comm.size)) Hello! I'm rank 0 from 1 running in total...
Note how virtualenv knows which python is associated with the environment so simply typing python
is sufficient.
The same applies for batch scripts - just source the virtualenv after loading your modules.
Launching MPI jobs
As with traditional MPI jobs you need to use srun to correctly launch the jobs:
srun python mympicode.py
Failure to use srun will result in only one rank being launched.
Notes for Deneb
MPI4PY with Intel Infiniband/Omnipath
By default MPI4PY is not fully compatible with the Interconnect on the Deneb cluster due to the use of some MPI 3.0 calls which are not supported.
The usual symptom is that communications between ranks will block or fail.
Please see the following page for the details: https://software.intel.com/en-us/articles/python-mpi4py-on-intel-true-scale-and-omni-path-clusters
The solution is to set the following variable just after importing MPI4PY
mpi4py.rc.recv_mprobe = False
A more explicit example is:
import mpi4py mpi4py.rc.recv_mprobe = False from mpi4py import MPI comm = MPI.COMM_WORLD .. ..
When using Intel MPI (module load intel-mpi) it is also possible to work around the issue by changing the fabric protocol via "export I_MPI_FABRICS=shm:ofa".
This is not possible for MVAPICH2 nor OpenMPI.
Different architectures and the GPU nodes
Deneb is a heterogeneous cluster with the following $SYS_TYPE
- x86_E5v2_IntelIB
- x86_E5v3_IntelIB
- x86_E5v2_Mellanox_GPU
The first two are cross compatible for mpi4py but if using the GPU nodes you should change to the appropriate SYS_TYPE
before configuring mpi4py
$ slmodules -s x86_E5v2_Mellanox_GPU -v [INFO] S+L release: stable [INFO] S+L systype: x86_E5v2_Mellanox_GPU [INFO] S+L engaged! $ echo $SYS_TYPE x86_E5v2_Mellanox_GPU