'Python garbage collection when rewriting variables
I am trying to run a very simple code like this in Python 3.9:
for i in some_list:
large_data=pd.read_csv('rawdata_%s.csv'%i)
procdata=somefunction(large_data)
procdata.to_csv('file_%s.csv'%i)
The list has about ~2,000 elements. Each large_data file can be ~200MB. The processing actually leads to a very small file to save (<1MB).
I am running the code on a cluster and I allocate 8GB of memory for the task. I assumed because I keep rewriting the variables that the code is actually memory efficient, but sometimes I exceed the limit and get the following error:
slurmstepd: error: Job 5118871 exceeded memory limit (8323668 > 8192000), being killed
slurmstepd: error: Exceeded job memory limit
slurmstepd: error: *** JOB 5118871 ON cac074 CANCELLED AT 2022-04-08T16:10:58 ***
What am I doing wrong? Isn't Python doing the garbage collection by itself? Thanks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
