'pandas list of dataframes very slow
I have a image sequence with 200 images. Each image shows several clusters from a explosion experiment. Now I want to create statistics over time. Therefore I divided my task into two subscripts:
- An python script which does the image processing. Done by opencv with findContour, ConvexHull, ConnectedComponentsWithStats, etc. For every single image i serialize my data as pandas dataframe to pickle. Size of all pickle files ~13MB
- An juypter notebook, which loads all dataframes into a list. But a major problem occured. Dataspell is very very slow (cannot load variables in efficient time, needs longer than normal for cell execution, creating a single plot needs minutes). I think the list creation is responsible for the slowdown. How could i save my data better? Is pandas multiindex a good choice if i have two dimension like time and number of clusters with all its parameters?
Thank you for your help
My code added:
import pandas as pd
import glob
folder = "dataframes_Versuch_T3_5_4g_Clusters"
fListGeo = glob.glob(os.path.join(folder, 'geometry','*'))
fListStats = glob.glob(os.path.join(folder, 'statistic','*'))
dfListGeo = []
dfListStats = []
for f1,f2 in tqdm(zip(fListGeo,fListStats)):
dfListGeo.append(pd.read_pickle(f1))
dfListStats.append(pd.read_pickle(f2))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
