'pandas list of dataframes very slow

I have a image sequence with 200 images. Each image shows several clusters from a explosion experiment. Now I want to create statistics over time. Therefore I divided my task into two subscripts:

An python script which does the image processing. Done by opencv with findContour, ConvexHull, ConnectedComponentsWithStats, etc. For every single image i serialize my data as pandas dataframe to pickle. Size of all pickle files ~13MB
An juypter notebook, which loads all dataframes into a list. But a major problem occured. Dataspell is very very slow (cannot load variables in efficient time, needs longer than normal for cell execution, creating a single plot needs minutes). I think the list creation is responsible for the slowdown. How could i save my data better? Is pandas multiindex a good choice if i have two dimension like time and number of clusters with all its parameters?

Thank you for your help

My code added:

import pandas as pd
import glob
folder = "dataframes_Versuch_T3_5_4g_Clusters"
fListGeo = glob.glob(os.path.join(folder, 'geometry','*'))
fListStats = glob.glob(os.path.join(folder, 'statistic','*'))

dfListGeo = []
dfListStats = []
for f1,f2 in tqdm(zip(fListGeo,fListStats)):
    dfListGeo.append(pd.read_pickle(f1))
    dfListStats.append(pd.read_pickle(f2))

python pandas

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'pandas list of dataframes very slow

Sources

Related Questions