'cleaning dataframe columns with single cell arrays of different types

I am working on a large dataframe with multiple columns. However, some of columns have data in the form of arrays with in arrays (single value). I need to convert the dataframe columns with only cell values i.e., without the array element style. I have tried flatten, squeeze in different ways, but could not get the output in the way I am looking. Following code reproduces the data format I am working at present:

import pandas as pd
a = [[[10]],[[20]],[[30]],[[40]]]
b=[[50],[60],[70],[80]]
c=[90,100,110,120]
df = pd.DataFrame(list(zip(a,b,c)),columns=['a','b','c'])
print(df)

The output of the above is:

        a     b    c
0  [[10]]  [50]   90
1  [[20]]  [60]  100
2  [[30]]  [70]  110
3  [[40]]  [80]  120

However, I am looking to get the output as below:

    a   b    c
0  10  50   90
1  20  60  100
2  30  70  110
3  40  80  120

It would really help, if you could suggest how to approach this problem.

The head of the actual dataframe is given below:

           acoeff         bcoeff  refdiff  ref18
0  [[0.33907555]]  [11.51908656]    0.000  0.001
1  [[0.34024954]]  [11.45693353]    0.001  0.001
2  [[0.34134777]]  [11.40045124]    0.002  0.001
3  [[0.34297324]]  [11.33036004]    0.004  0.001
4  [[0.34373931]]   [11.2991075]    0.005  0.001

The head in the dictionary format given below:

{'acoeff': {0: '[[0.33907555]]', 1: '[[0.34024954]]', 2: '[[0.34134777]]', 3: '[[0.34297324]]', 4: '[[0.34373931]]'}, 'bcoeff': {0: '[11.51908656]', 1: '[11.45693353]', 2: '[11.40045124]', 3: '[11.33036004]', 4: '[11.2991075]'}, 'refdiff': {0: 0.0, 1: 0.001, 2: 0.002, 3: 0.004, 4: 0.005}, 'ref18': {0: 0.001, 1: 0.001, 2: 0.001, 3: 0.001, 4: 0.001}}


Solution 1:[1]

strings

strip the [] and convert to numeric:

(df.update(df.select_dtypes(exclude='number')
             .apply(lambda c: pd.to_numeric(c.str.strip('[]'))))
 )
print(df)

real lists

You can unnest the list with the str locator:

df['a'].str[0].str[0]

output:

0    10
1    20
2    30
3    40
Name: a, dtype: int64

To automatize things a bit, you can use a recursive function:

def unnest(x):
    from pandas.api.types import is_numeric_dtype
    if is_numeric_dtype(x):
        return x
    else:
        return unnest(x.str[0])

df2 = df.apply(unnest)

variant using the first item of each Series to determine the nesting level:

def unnest(x):
    from pandas.api.types import is_numeric_dtype
    if len(x)>0 and isinstance(x.iloc[0], list):
        return unnest(x.str[0])
    else:
        return x

df2 = df.apply(unnest)

output:

    a   b    c
0  10  50   90
1  20  60  100
2  30  70  110
3  40  80  120
arbitrary nesting

If you had an arbitrary nesting for each cell, you could use the same logic per element:

def unnest(x):
    if isinstance(x, list) and len(x)>0:
        return unnest(x[0])
    else:
        return x
    
df2 = df.applymap(unnest)

Solution 2:[2]

Maybe not the best solution. But it works.

def ravel_series(s):
    try:
        return np.concatenate(s).ravel()
    except ValueError:
        return s

df.apply(ravel_series)

Solution 3:[3]

You can try this,

Code:

def clean(el):
  if any(isinstance(i, list) for i in el):
    return el[0][0]
  elif isinstance(row, list):
    return el[0]

df['a'] = df.a.apply(clean)
df['b'] = df.b.apply(clean)

print(df)

Output:

    a   b    c
0  10  50   90
1  20  60  100
2  30  70  110
3  40  80  120

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Salvatore Daniele Bianco
Solution 3 shaik moeed