Parallelization in Python

Kanishk Varshney
4 min readNov 27, 2022

--

Photo by Hitesh Choudhary on Unsplash

Recently, at my work, we were facing some slowdown in the production environment and I was put on the performance monitoring and optimization effort. I analyzed and benchmarked the algorithms, and figured that although the algorithms/libraries were optimized, the large number of inputs was becoming a bottleneck in the system.

Most of the code was either written as a simple for loop, or some services had a multiprocessing map method-based parallelization implementation. Ideally, this implementation should have taken worked well, but our processes were generating Image segmentation masks, and adding a blocking call wouldn’t yield many gains in the performance on a large dataset. When I explored the multiprocessing module in Python, I came across the following methods available at my disposal:

apply()

apply_async()

map()

map_async()

imap()

imap_unordered()

starmap()

starmap_async()

I skimmed through the official documentation but found it a bit too verbose and overwhelming. So, after spending some time on the net and writing some basic code to understand the different scenarios and behaviors of these methods.

In this article, I have put together a brief, and simple to grasp summary with basic code snippets to understand and choose the best possible method for your use case. For a better understanding, I will be using the following sample functions and calling them 1000 times to benchmark various implementations.

Target Input Function | Single Input
Target Input Function | Multiple Inputs

Food for thought: Notice the 0.1s sleep inside the fucntion, what will happen if we remove this delay?

A typical, non-parallel loop-based implementation, will look as follows and takes ~1min 42sec to finish

Loop-based Implementation

We will use this as a baseline and see various multiprocessing variants on a 12-core CPU (# workers)

map()

map() is the basic/standard parallel processing implementation for the job. It simply distributes the jobs amongst workers in a blocking manner.

Note: Map() method works only for Single Argument functions. For multi-argument support, see starmap()

Threadpool Map Implementation

Whoa! Run time reduced from 102 seconds to only 8.85 seconds (as expected ~102/12 cores). QED!!

map_async()

map_async() is the non-blocking version of the aforementioned map() method. Instead of returning the results directly, this will return a <multiprocessing.pool.MapResult > object.

Threadpool Map Async Implementation

You will have to fetch the result from the MapResult object.

Threadpool Map Async Implementation: Wait

Fetching the results from the Map Async takes a similar time as Map. This is just a non-blocking version of the Threadpool Map and can be useful in places where you don’t have to wait for the entire process to complete before completing other tasks. Just trigger and Map Ahead.

starmap()

You would have noticed so far, and read above as well, map() method works with only single argument callables. If you want to parallelize the functions with multiple arguments, you will have to use starmap()

Threadpool StarMap Implementation

Whoa again! Same Speed up, but now with multiple arguments.

starmap_async()

starmap() also comes with an asynchronous variant starmap_async(). This works similarly to map_async(), but for multiple argument functions.

Threadpool StarMap Async Implementation

apply()

apply() function is used to issue one-off tasks, and is called with the arguments list. This works similarly to calling a function directly and has been kept to maintain backward compatibility

func(*args, **kwargs) is preferred over apply(f,args,kwargs)

apply_async()

This is an asynchronous variant of apply() but accepts only a single argument. This is more suited for performing tasks in parallel or when you don’t need to wait for the output(e.g., function generating output files, etc.)

imap()

Most of you might be familiar with the concept of generators in python. imap(), lazier version of map(), does just that. Instead of returning the result or MapResult object, it returns IMapIterator object, an iterator to the results

Threadpool Imap Implementation

You can iterate over the results to get the final output

imap_unordered()

Sometimes the order of results doesn’t matter, and you would want to get the results in the order they get processed instead of the order in which the request is submitted. You can use imap_unordered() for such processes.

Summary

To re-iterate, the choice of function you would end up using will depend on your use case. You will have to consider multi-args, concurrency, blocking and ordering

You can also refer to the following explanations, to grasp the multiprocessing module:

Thanks for following through, hope this small read helps you understand Python’s multiprocessing module and utilize it properly.

--

--