'Split list automatically for multiprocessing

I am learning multiprocessing in Python, and thinking of a problem. I want that for a shared list(nums = mp.Manager().list), is there any way that it automatically splits the list for all the processes so that it does not compute on same numbers in parallel.

Current code:

# multiple processes

nums = mp.Manager().list(range(10000))
results = mp.Queue()
def get_square(list_of_num, results_sharedlist):
    # simple get square
    results_sharedlist.put(list(map(lambda x: x**2, list_of_num)))

start = time.time()
process1 = mp.Process(target=get_square, args = (nums, results))
process2 = mp.Process(target=get_square, args=(nums, results))

process1.start()
process2.start()
process1.join()
process2.join()

print(time.time()-start)
for i in range(results.qsize()):
    print(results.get())

Current Behaviour

It computes the square of same list twice

What I want

I want the process 1 and process 2 to compute squares of nums list 1 time in parallel without me defining the split.



Solution 1:[1]

You can make function to decide on which data it needs to perform operations. In current scenario, you want your function to divide the square calculation work by it's own based on how many processes are working in parallel.

To do so, you need to let your function know which process it is working on and how many other processes are working along with it. So that it can only work on specific data. So you can just pass two more parameters to your functions which will give information about processes running in parallel. i.e. current_process and total_process.

If you have a list of length divisible by 2 and you want to calculate squares of same using two processes then your function would look something like as follows:

def get_square(list_of_num, results_sharedlist, current_process, total_process):
    total_length = len(list_of_num)
    start = (total_length // total_process) * (current_process - 1)
    end = (total_length // total_process) * current_process
    results_sharedlist.put(list(map(lambda x: x**2, list_of_num[start:end])))

TOTAL_PROCESSES = 2
process1 = mp.Process(target=get_square, args = (nums, results, 1, TOTAL_PROCESSES))
process2 = mp.Process(target=get_square, args=(nums, results, 2, TOTAL_PROCESSES))

The assumption I have made here is that the length of list on which you are going to work is in multiple of processes you are allocating. And if it not then the current logic will leave behind some numbers with no output.

Hope this answers your question!

Solution 2:[2]

Agree on the answer by Jake here, but as a bonus: if you are using a multiprocessing.Pool(), it keeps an internal counter of the multiprocessing threads spawned, so you can avoid the parametr to identify the current_process by accessing _identity from the current_process by multiprocessing, like this:

from multiprocessing import current_process, Pool

p = current_process()
print('process counter:', p._identity[0])

more info from this answer.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jake Peralta
Solution 2 rikyeah