'How to create a shared 2D array in python multiprocessing

I need to process a large matrix using python's multiprocessing. That said I need a 2D array. That array should be shared/accessed/updated by different child processes in Synchronous manner. (Therefore I use mp.Manager).

On creating an array, multiprocess.Array() let me create a 1D array. I tried to convert the "Shared 1D array" into a 2D numpy array using np.frombuffer() (see the CreateArray function).

In the function,CreateArray(3,4) command creates a shared array of length 12 (=3x4). However the arr = np.frombuffer(mp_arr.get_obj()) command creates a numpy array of length 6!!!. Eventually b = arr.reshape((n, m)) command fails to convert the array of size 6 into 3x4 matrix.

How to fix this and get a 2D shared array?

EDIT: With the correction the code creases a 2D array (thanks to GabrielC). But it looks the array is not shared. The function addData(array) assigns some values in the matrix. But when printed from the main I get only zeros.

Problem: How to fix this and get a 2D shared array?

import multiprocessing as mp
import numpy as np
import ctypes as c

def CreateArray(n,m):
    mp_arr=mp.Array('i',n*m)
#    arr = np.frombuffer(mp_arr.get_obj())  #This command must be corrected. Thanks to GabrielC
    arr = np.frombuffer(mp_arr.get_obj(),c.c_int)  # mp_arr and arr share the same memory
    # make it two-dimensional
    print('np array len=',len(arr))
    b = arr.reshape((n, m))  # b and arr share the same memory
    return b

def addData(array):
    n,m=np.shape(array)
    i=0
    for nn in range(n):
        for mm in range(m):
            array[nn][mm]=i
            i=i+1
    print(array)

if __name__=='__main__':
    with mp.Manager() as manager:
        Myarray=CreateArray(3,4)
        p1=mp.Process(target=addData,args=(Myarray,))
        p1.start()
        p1.join()
        print(Myarray)

enter image description here



Solution 1:[1]

You shoud use the dtype parameter of the method frombuffer when you load the buffer with numpy (default is float):

arr = np.frombuffer(mp_arr.get_obj(), c.c_int)

Solution 2:[2]

I believe you should pass the instance of the shared memory. You are currently passing the numpy array, which will de-serialized when it passes through the multiprocessing pool. You will have to reshape the array in the addData function, but this is just providing a new view into the array, and should be fast.

import multiprocessing as mp
import numpy as np
import ctypes as c

def CreateArray(n,m):
    return mp.Array('i',n*m)

def addData(mp_arr):

    arr = np.frombuffer(mp_arr.get_obj(),c.c_int)
    arr = arr.reshape((n, m))

    i=0
    for nn in range(n):
        for mm in range(m):
            arr[nn][mm]=i
            i=i+1
    print(arr)

if __name__=='__main__':
    with mp.Manager() as manager:
        Myarray=CreateArray(3,4)
        p1=mp.Process(target=addData,args=(Myarray,))
        p1.start()
        p1.join()
        print(Myarray)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 GabrielC
Solution 2 Raphael Erik Hviding