'Calculate a mean of pandas dataframe whose cells are list

Suppose I have the following pandas dataframe

import pandas as pd
import numpy as np
df= pd.DataFrame(np.nan, columns =["A","B","C"], index =np.arange(5))
df=df.astype(object)
for c in list(df):
    for i in df.index.values:
        df.at[i, c]=np.arange(5).tolist()

This results in df whose cells are numpy arrays

df
Out[16]: 
                 A                B                C
0  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]
1  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]
2  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]
3  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]
4  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]

I would like to calculate the mean of the dataframe but it does not work as each cell is treated as a string. For example,

type(df.loc[0][0])
Out[19]: list

Thus, if I calculate its mean, it returns nan

df["Average"]= df.mean(axis=1)

df
Out[21]: 
                 A                B                C  Average
0  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]      NaN
1  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]      NaN
2  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]      NaN
3  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]      NaN
4  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]      NaN

My question is, how would I convert this df back to numeric values that I can work with?



Solution 1:[1]

I think idea convert values to columns is really good, because then is possible use pandas vectorized functions:

df1 = pd.concat([pd.DataFrame(df[c].values.tolist()) for c in df.columns], 
                 axis=1, 
                 keys=df.columns)
df1.columns = ['{}{}'.format(i, j) for i, j in df1.columns]
print (df1)
   A0  A1  A2  A3  A4  B0  B1  B2  B3  B4  C0  C1  C2  C3  C4
0   0   1   2   3   4   0   1   2   3   4   0   1   2   3   4
1   0   1   2   3   4   0   1   2   3   4   0   1   2   3   4
2   0   1   2   3   4   0   1   2   3   4   0   1   2   3   4
3   0   1   2   3   4   0   1   2   3   4   0   1   2   3   4
4   0   1   2   3   4   0   1   2   3   4   0   1   2   3   4

But if need mean of all lists together:

df= pd.DataFrame(np.nan, columns =["A","B","C"], index =np.arange(5))
df=df.astype(object)
for c in list(df):
    for i in df.index.values:
        df.at[i, c]=np.arange(i+1).tolist()
print (df)
                 A                B                C
0              [0]              [0]              [0]
1           [0, 1]           [0, 1]           [0, 1]
2        [0, 1, 2]        [0, 1, 2]        [0, 1, 2]
3     [0, 1, 2, 3]     [0, 1, 2, 3]     [0, 1, 2, 3]
4  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]

from itertools import chain
from statistics import mean
df['Average'] = [mean(list(chain.from_iterable(x))) for x in df.values.tolist()]
print (df)
                 A                B                C  Average
0              [0]              [0]              [0]      0.0
1           [0, 1]           [0, 1]           [0, 1]      0.5
2        [0, 1, 2]        [0, 1, 2]        [0, 1, 2]      1.0
3     [0, 1, 2, 3]     [0, 1, 2, 3]     [0, 1, 2, 3]      1.5
4  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]      2.0

EDIT:

If values are strings:

df= pd.DataFrame(np.nan, columns =["A","B","C"], index =np.arange(5))
df=df.astype(object)
for c in list(df):
    for i in df.index.values:
        df.at[i, c]=np.arange(5).tolist()

df=df.astype(str)
print (df)
                 A                B                C
0  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]
1  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]
2  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]
3  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]
4  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]  [0, 1, 2, 3, 4]

df1 = pd.concat([df[c].str.strip('[]').str.split(', ', expand=True) for c in df.columns], 
                 axis=1, 
                 keys=df.columns).astype(float)
df1.columns = ['{}{}'.format(i, j) for i, j in df1.columns]
df1["Average"]= df1.mean(axis=1)
print (df1)
    A0   A1   A2   A3   A4   B0   B1   B2   B3   B4   C0   C1   C2   C3   C4  \
0  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0   
1  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0   
2  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0   
3  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0   
4  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0  0.0  1.0  2.0  3.0  4.0   

   Average  
0      2.0  
1      2.0  
2      2.0  
3      2.0  
4      2.0  

Solution 2:[2]

You might want to restructure your dataframe as mentioned. But to work with what you have, assuming you want the mean of each element in the dataframe, you can try the applymap method.

df.applymap(np.mean)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 busybear