'Create dict in one module and access it in another module through multiprocessing
I am new to python and multiprocessing concepts in python (this is my first python project).
I have written few modules and wired them up together to work in sequential manner. Right now, i have requirement to fasten few things.
What i want to achieve is:
module-one.py
Read a json and store it as dict (normal dict or multiprocessing.Manager.dict)
module-two.method()
module-two.py
-- Some methods for business logic --
multiprocessing.process(target=module-three.method)
module-three.py
def method():
multiprocessing.process(target=module-four.method)
module-four.py
def method():
I should access the dict that was created in module-one
The global dict that mutiple processes can access
--- More business logic and data transformations ---
Note:
I am constrained not to use any frameworks like Flask. Else, i could have tried flask g to store things globally.
I am constrained not to use any external caching mechanisms like memcache or redis
To lessen the overhead, i tried combining the modules three and four into one. That also did not help. The dict in module-four or module-three is always empty.
My questions are:
- Is it possible to achieve what i have posted above?
- If it is not possible, what are the alternate ways to handle my requirements.
I browsed extensively stackoverflow and other forums. I found many single module examples where dict is created at module namespace or inside a class and same dict is passed as an argument to spawning processes. Based on those examples, it looks like i should pass the dict from module-one to module-two and so on upto module-four. I felt that there might be a better approach instead of passing the dict from one module to another. Hence i am posting this question.
Thanks, A newbie python coder
Solution 1:[1]
This is how you can avoid having to pass the managed dictionary explicitly to your worker function, which can now instead access it as a global variable:
Both multiprocessing.pool.Pool and concurrent.futures.ProcessPoolExecutor have initializer and initargs arguments that allow you to specify a function and arguments to be passed to that function that will be called once for each process in the multiprocessing pool to allow that function to initialize global variables for each process. So, for example, using multiprocessing.pool.Pool, the following code will work for both Windows and Linux:
from multiprocessing import Pool, Manager
def init_pool_processes(d):
# Initialize global variable managed_dict for each pool process:
global managed_dict
managed_dict = d
def worker(i):
managed_dict[i] = i ** 2
def main():
manager = Manager()
managed_dict = manager.dict()
pool = Pool(initializer=init_pool_processes, initargs=(managed_dict,))
pool.map(worker, range(10))
pool.close()
pool.join()
print(managed_dict)
if __name__ == '__main__': # Required for Windows
main()
Prints:
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
If you are running under a platform such as Linux that uses the fork method to create new processes, then those new processes would inherit the main processes global variables automatically as read/only variables. Once an attempt is made to modify such a variable by the subprocess, then a copy of that variable is made. However, in this case, the global variable in question is a reference to a managed dictionary (actually, a reference to a proxy for the actual dictionary) and this reference is not modified by the subprocesses, only what the reference refers to, i.e. the dictionary itself. So the following code would be used:
from multiprocessing import Pool, Manager
def worker(i):
managed_dict[i] = i ** 2
def main():
global managed_dict
manager = Manager()
managed_dict = manager.dict()
# the processes created will inherit global managed_dict:
pool = Pool()
pool.map(worker, range(10))
pool.close()
pool.join()
print(managed_dict)
main()
Prints:
{0: 0, 8: 64, 9: 81, 1: 1, 2: 4, 3: 9, 5: 25, 6: 36, 7: 49, 4: 16}
This is why when you post a question tagged with multiprocessing, you are supposed to also tag the question with the platform you are running under; the answer very much depends on the platform.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Booboo |
