'Python File Handling Differences between Linux and Windows
I have a weird problem I'm trying to understand.
I have a project I'm working on that I'm trying to use multiprocessing for to increase run speed for tasks that can take place concurrently. I have a weird problem that I've noticed where an array that I'm updating and then pickling is not actually being updated with each run.
I created a simple demonstration to abstract out the actual function that I'm struggling with. Sandbox is my driver function that handles all of the multiprocessing. Sandbox2 is a module that reads in a pickle, saves it as an array, and then adds a value to that array and writes the pickle back out:
Sandbox.py (Driver):
import sandbox2, multiprocessing
from multiprocessing import Process
if __name__ == '__main__':
for x in range (1,5):
print("Run",x)
processes = []
processes.append(Process(target=sandbox2.run))
for process in processes:
process.start()
for process in processes:
process.join()
print()
Sandbox2.py (Module):
import pickle
array = []
picklepath = 'test.pickle'
try:
picklefile = open(picklepath,'rb')
array = pickle.load(picklefile)
picklefile.close()
except:
print("Pickle doesn't exist")
def addVal(val):
array.append(val)
picklefile = open(picklepath,'wb+')
pickle.dump(array,picklefile)
picklefile.close()
def run():
print("Start:",len(array))
addVal('Array Value')
print("End: ",len(array))
What I'm expecting to happen is that with each run, I see the number of objects in the array increase by one each time. What actually happens is the value is added but is missing again with the next iteration. However, when I run the program again, there is one value read in from the pickle. See the output of two consecutive runs below:
bryan@LinTop:$python3 sandbox.py
Pickle doesn't exist
Run 1
Start: 0
End: 1
Run 2
Start: 0
End: 1
Run 3
Start: 0
End: 1
Run 4
Start: 0
End: 1
bryan@LinTop:$python3 sandbox.py
Run 1
Start: 1
End: 2
Run 2
Start: 1
End: 2
Run 3
Start: 1
End: 2
Run 4
Start: 1
End: 2
bryan@LinTop:$
Notice how The first iteration has each run starting with an empty array but the second one starts with 1 object. I would expect the counts to increment by 1 over each run and start at 4 when I run the code again.```
Now, queue a single-threaded approach (Sandbox3):
import sandbox2
if __name__ == '__main__':
for x in range (1,5):
print("Run",x)
sandbox2.run()
print()
The output from this model is what I expect to happen:
bryan@LinTop:$python3 sandbox3.py
Pickle doesn't exist
Run 1
Start: 0
End: 1
Run 2
Start: 1
End: 2
Run 3
Start: 2
End: 3
Run 4
Start: 3
End: 4
bryan@LinTop:$python3 sandbox3.py
Run 1
Start: 4
End: 5
Run 2
Start: 5
End: 6
Run 3
Start: 6
End: 7
Run 4
Start: 7
End: 8
bryan@LinTop:$
What about multiprocessing am I missing here and is there an easy solution to this problem? I've adapted my project from a single-threaded model, so it used to work but now does not work with the multi-threaded model.
EDIT:
This gets weirder. When I run my code on a Windows system, it behaves as I thought that it should:
py .\sandbox.py
Pickle doesn't exist
Run 1
Pickle doesn't exist
Start: 0
End: 1
Run 2
End: 2
Run 3
Start: 2
End: 3
Run 4
Start: 3
End: 4
Is there something about Linux that changes the way the files behave?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
