'How to create a shared 2D array in python multiprocessing
I need to process a large matrix using python's multiprocessing. That said I need a 2D array. That array should be shared/accessed/updated by different child processes in Synchronous manner. (Therefore I use mp.Manager).
On creating an array, multiprocess.Array() let me create a 1D array. I tried to convert the "Shared 1D array" into a 2D numpy array using np.frombuffer() (see the CreateArray function).
In the function,CreateArray(3,4) command creates a shared array of length 12 (=3x4). However the arr = np.frombuffer(mp_arr.get_obj()) command creates a numpy array of length 6!!!. Eventually b = arr.reshape((n, m)) command fails to convert the array of size 6 into 3x4 matrix.
How to fix this and get a 2D shared array?
EDIT:
With the correction the code creases a 2D array (thanks to GabrielC). But it looks the array is not shared. The function addData(array) assigns some values in the matrix. But when printed from the main I get only zeros.
Problem: How to fix this and get a 2D shared array?
import multiprocessing as mp
import numpy as np
import ctypes as c
def CreateArray(n,m):
mp_arr=mp.Array('i',n*m)
# arr = np.frombuffer(mp_arr.get_obj()) #This command must be corrected. Thanks to GabrielC
arr = np.frombuffer(mp_arr.get_obj(),c.c_int) # mp_arr and arr share the same memory
# make it two-dimensional
print('np array len=',len(arr))
b = arr.reshape((n, m)) # b and arr share the same memory
return b
def addData(array):
n,m=np.shape(array)
i=0
for nn in range(n):
for mm in range(m):
array[nn][mm]=i
i=i+1
print(array)
if __name__=='__main__':
with mp.Manager() as manager:
Myarray=CreateArray(3,4)
p1=mp.Process(target=addData,args=(Myarray,))
p1.start()
p1.join()
print(Myarray)
Solution 1:[1]
You shoud use the dtype parameter of the method frombuffer when you load the buffer with numpy (default is float):
arr = np.frombuffer(mp_arr.get_obj(), c.c_int)
Solution 2:[2]
I believe you should pass the instance of the shared memory. You are currently passing the numpy array, which will de-serialized when it passes through the multiprocessing pool. You will have to reshape the array in the addData function, but this is just providing a new view into the array, and should be fast.
import multiprocessing as mp
import numpy as np
import ctypes as c
def CreateArray(n,m):
return mp.Array('i',n*m)
def addData(mp_arr):
arr = np.frombuffer(mp_arr.get_obj(),c.c_int)
arr = arr.reshape((n, m))
i=0
for nn in range(n):
for mm in range(m):
arr[nn][mm]=i
i=i+1
print(arr)
if __name__=='__main__':
with mp.Manager() as manager:
Myarray=CreateArray(3,4)
p1=mp.Process(target=addData,args=(Myarray,))
p1.start()
p1.join()
print(Myarray)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | GabrielC |
| Solution 2 | Raphael Erik Hviding |

