'What is the best way to parallelise this code?

I have a function that I would like to maximise using python. However, the evaluation of this function is fairly slow and I would like to find a way to speed it up by parallelising this code. I'm not very familiar with how to do this, so any help would be appreciated.

In short, I have a cost() function that is then optimised using some library e.g. scipy.optimize. This function evaluates some other function do_calculation() 100 times and then averages these results (the method of this other function doesn't matter, but the results have some statistical spread). This average is the quantity I want to maximise. However, evaluating it is actually quite expensive especially 100s of times, so I would like to parallelise the evaluation of that mean. How could I go about doing this in an efficient way?

def function():
    val_list = []
    for i in range(100):
        val = do_calculation()
        val_list.append(val)
    return np.mean(np.array(val_list))

I was thinking about using multiprocessing to split up this loop, but then how do I rejoin all the values on different processors to calculate a final mean?



Solution 1:[1]

Using multiprocessing you can use a process pool to map your function to a list of possible arguments

from multiprocessing import Pool

def do_calculation(idx):
    pass # write your code here

N_PROC=2
def function():
    val_list = []
    with Pool(N_PROC) as p:
        val_list = p.map(do_calculation, range(100))
    return np.mean(np.array(val_list))

Actually for this case, where you are computing the average (and that does not depend on the order), you could also use imap_unordered, in some cases may be advantageous, if the jobs take considerably different times.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Bob