'How do I speed up an optimized CPU-bound process that runs within a parallelized program in Python?

A Python program of mine uses the multiprocessing module to parallelize iterations of a search problem. Besides doing other things, each iteration loops over a CPU-expensive process that is already optimized in Cython. Because this process gets called multiple times while looping, this significantly slows down the total runtime.

What is the recommended way to achieve a speed-up in this case? As the expensive process can't be further CPU-optimized, I've considered parallelizing the loop. However, as the loop lives in an already parallelized (by multiprocessing) program, I don't think this would be possible on the same machine.

My research on this has failed to find any best practices or any sort of direction.



Solution 1:[1]

As a quick way to see if it might be possible to optimize your existing code, you might check your machines CPU usage while the code is running.

If all your cores are ~100% then adding more processes etc isn't likely to improve things.

In that case you could

1 - Try further algorithm optimisation (though best bang for the buck is to profile your code first to see where it's slow). Though if you've already been using Cython then likely this might have limited returns

2 - Try a faster machine and/or with more cores

Another approach however (one that I've used) is instead to develop a serverless design, and run your CPU intensive, parallel parts of your algorithm using any of the cloud vendors serverless models.

I've personally used AWS lamda, where we parallelized our code to run with 200+ simultaneous lambda processes, that is roughly equivalent to a 200+ core single machine.

For us, this essentially resulted in a 50-100 times increase in performance (measured as reduction in total processing time) compared to running on a 8-core server.

You do have to do more work to implement a serverless deployment model, and then wrapper code to manage everything, which isn't trivial. However the ability to essentially scale infinitely horizontally may potentially make sense for you.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Richard