Parallel Python#

Parallelism in python can be achieve in a multitude of ways with one end member being the capabilities provided by the standard libraries that are part of python and on the other being the mpi standard.

Native Python High-level interface for asynchronously executing callables#

Since version 3.2 python provides a a high-level interface for asynchronously executing callables.

https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor

Thread pool#

concurrent.futures.ThreadPoolExecutor is an Executor subclass that uses a pool of threads to execute calls asynchronously.

  • Threads are lightweight execution units managed by the Python interpreter (and ultimately the OS) within a single process.

  • Threads within the same process share the same memory space.

  • In standard CPython (the most common Python implementation), the Global Interpreter Lock (GIL) prevents multiple threads from executing Python bytecode simultaneously within the same process.

Process pool#

concurrent.futures.ProcessPoolExecutor An Executor subclass that executes calls asynchronously using a pool of at most max_workers processes

  • Each worker in the pool is a distinct operating system process, with its own memory space and Python interpreter instance.

  • Processes have independent memory spaces.

  • Since each worker is a separate process with its own Python interpreter and memory space, each process also has its own GIL

An example for this being used is pyfm2d specfically the function _calc_wavefronts_multithreading in inlab-geo/pyfm2d

def _calc_wavefronts_multithreading(
    v,
    recs,
    srcs,
    nthreads=2,
    extent=[0.0, 1.0, 0.0, 1.0],
    options: Optional[WaveTrackerOptions] = None,
) -> WaveTrackerResult:

    # Since this function is called when there are multiple sources, we can't specify a single source for the full field calcutlation
    # Although we could create a list of source indices...
    options.ttfield_source = -1

    futures = []
    # https://docs.python.org/3/library/concurrent.futures.html
    with concurrent.futures.ProcessPoolExecutor(max_workers=nthreads) as executor:
        for i in range(np.shape(srcs)[0]):
            futures.append(
                executor.submit(
                    _calc_wavefronts_process, v, recs, srcs[i, :], extent, options
                )
            )
    result_list = [f.result() for f in futures]
    return reduce(operator.add, result_list)

Native Python lower level interfaces for asynchronously executing callables#

The aim of concurrent.futures is to hide some of the complexities of the underlying threading and multiprocessing mechanisms and thus to make parallel programming more accessible and less error-prone for common scenarios.

threading#

https://docs.python.org/3/library/threading.html#module-threading

multiprocessing#

https://docs.python.org/3/library/multiprocessing.html

Mutliprocessign behaves differently on Windows and POSIX operating systems, that is multiprocessing.set_start_method("fork") appears to nor work on Windows where it appears that one needs to use multiprocessing.set_start_method("spawn")

https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

Pipeline processing#

joblib#

joblib is a python library implementing a pipeline processing system.

https://joblib.readthedocs.io/en/stable/

Message Passing interface#

mpi4py#

https://mpi4py.readthedocs.io/en/stable/index.html

MPI for Python provides Python bindings for the Message Passing Interface (MPI) standard and thus allows Python application to use multiple processors on distributed memory machines. This means scripts have to be run using mpiexec and use MPI API concepts such as scatter and gather.

https://mpi4py.readthedocs.io/en/stable/tutorial.html#running-python-scripts-with-mpi

Uniform interfaces#

schwimmbad#

Schwimmbad implement parallel processing pools with an ability to switching easily between local development (e.g., serial processing or with multiprocessing) and deployment on a cluster or supercomputer (via, e.g., MPI or JobLib).

https://schwimmbad.readthedocs.io/en/latest/