'Single Pandas DataFrame or multiple DataFrames
I am having millions of devices and from these millions of devices, I am going to get counter values at periodic intervals for those devices. This particular value has to be held in memory and i have to do further processing on top of this.
This is my problem. To handle this problem, I am using Pandas DataFrame with 2 columns. One column is time stamp and the other column being actual counter value. There are multiple options that I am looking at. 1.) One DataFrame for all these 1 million devices. 2.) The 1 million devices are logically grouped based on the location of the device. So instead of having all 1 million device counters in a single DataFrame, I can create multiple DataFrames based on the location of the server where device resides. If we do the 2nd way, it will come close to 1000 DataFrames. So I can have 1000 DataFrames based on the grouping of the devices on location. So it comes to 1 DataFrame for 1 group. The reason of splitting this into 1000 DataFrames is because of the parallelism (both read/write) which I can do (multi-processing). I can have parallelism implemented effectively. And also the time that it takes to do some calculations on the DataFrame will be a simple operation in smaller size of DataFrame.
Some of the questions that I have is:
1.) what is the maximum number of DataFrames that we can create in Pandas in a system that has 1-2GB of RAM
2.) Having more number of DataFrames, will it induce any overhead in pandas
I need some opinion on this like what is the best way - should I go with 1000 DataFrames or 1 huge DataFrame.
Please note: I am getting the data from the devices at a periodic interval (e.g. 5 minutes) which is in the network and the total devices approx. is 1000000. Also regularly, depending on some conditions, I need to delete the old data. Its not just reading the data and appending the data in DataFrame but also deletion of old data.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
