'Parallel map with nested functions that depend on modules

Here's a minimal example of what I'm trying to parallelize

import numpy as np

def generate_function(a):
    def func(x):
        '''a complex function that uses several modules'''
        return x + np.sqrt(a)
    return func

if __name__ == '__main__':
    f = generate_function(0.5)
    x = np.arange(0, 100)
    y = np.array(list(map(f, x))) # want to parallelize this step

with multiprocessing, the nested func causes problems, since pickle can't access nested functions

import multiprocessing as mp
...
pool = mp.Pool(2)
y = np.array(pool.map(f, x))

AttributeError: Can't pickle local object 'generate_function.<locals>.func'

even with pathos, the modules are not imported

import pathos
...
pool = pathos.multiprocessing.ProcessPool(2)
y = np.array(pool.map(f, x))

NameError: name 'np' is not defined

Note that none of the other solutions on Python multiprocessing PicklingError: Can't pickle <type 'function'> work either

What's the best way to parallelize this?


So it is possible to get pathos to work by reimporting inside of generate_function

def generate_function(a):
    import numpy as np
    def func(x):
        '''a complex function that uses several modules'''
        return x + np.sqrt(a)
    return func

but I may have several imports with multiple generate_functions and multiple layers of nesting, and it will quickly get quite cumbersome keeping track of all that, so I would like to avoid this mess

def generate_function1(a):
    import module1, module2, module3
    from module4 import a, b
    from module5 import c as d
    from module6 import e as f
    def func(x):
        ...
    return func

def generate_function2(a):
    import module1, module2, module3
    from module4 import a, b
    from module5 import c as d
    from module6 import e as f
    def func(x):
        ...
    return func

def generate_generator_function(a):
    import module1, module2, module3
    from module4 import a, b
    from module5 import c as d
    from module6 import e as f
    def generate_function(a):
        import module1, module2, module3
        from module4 import a, b
        from module5 import c as d
        from module6 import e as f
        def func(x):
            ...
        return func
    return generate_function


Solution 1:[1]

You may use concurrent.futures:

import concurrent.futures

f = generate_function(0.5)
x = np.arange(0, 100)
with concurrent.futures.ThreadPoolExecutor() as ex:
    y = ex.map(f, x)

Solution 2:[2]

This won't solve your pickle problems but here's my thinking with classes to manage your imports.

>>> class funcFactory:
...     import numpy as np
...     def __init__(self):
...             pass
...     def makef(self,a):
...             def func(x):
...                     return a+funcFactory.np.sqrt(x)
...             return func
...
>>> ff = funcFactory()
>>> f = ff.makef(1)
>>> f(4)
3.0

Incorporating @Schotty's suggestion to use concurrent.futures you end up with code that looks like this:

import concurrent.futures
import numpy as np

class funcFactory:
    import numpy as np
    def makef(self,a):
        def func(x):
            return a+funcFactory.np.sqrt(x)
        func.__reduce__ = lambda:""
        return func

f = funcFactory().makef(0.5)
with concurrent.futures.ThreadPoolExecutor() as ex:
    y = ex.map(f, np.arange(0, 100))
print(list(y))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Schottky
Solution 2