'Converting a pandas dataframe to multi-index and changing values

I have the following dataframe:

d = [{'AX':['Rec=1','POSi=2'], 'AVF1':[], 'HI':['Rec=343', 'POSi=4'], 'version_1':[]},
  {'AX':[], 'AVF1':['Rec=4', 'POSi=454'], 'HI':[], 'version_2':[]},
  {'AX':['Rec=3', 'POSi=32'], 'AVF1':['Rec=6', 'POSi=12'], 'HI':[], 'version_3':[]}]

frame = pd.DataFrame(d)

frame = frame.melt(cols).dropna().drop('value', axis=1).set_index('variable').T

Current output:

variable    version_1          version_2           version_3
AX          [Rec=1, POSi=2]    []                  [Rec=3, POSi=32]
AVF1        []                 [Rec=4, POSi=454]   [Rec=6, POSi=12]
HI          [Rec=343, POSi=4]  []                  []

I am trying to move the version_1, version_2 etc to a higher level and add part of the values as headers like so:

            version_1          version_2           version_3

            Rec     POSi       Rec     Posi        Rec     POSi
AX          1       2          NAN     NAN         3       32
AVF1        NAN     NAN        4       454         6       12
HI          343     4          NAN     NAN         NAN     NAN

The main data frame I am working on has a very large number of columns so it's not possible to manually type out all the variables. I would upload but my internet is pretty slow.

I have tried:

frame.columns = [['version_1', 'version_2','version_3'], frame.columns]
frame.columns.set_levels(['Rec','POSi','c'], level=1, inplace=True)

But this does not work.

Any ideas would be great



Solution 1:[1]

This can be done, in one line, with a combination of pd.json_normalize, set_index, set_axis, MultiIndex.from_product, and pd.concat:

new_df = pd.concat([
        pd.json_normalize(
            df[c].apply(
                lambda x:
                dict([x.split('=') for x in x])
            )
        )
        .set_index(df.index)
        .pipe(
            lambda x:
            x.set_axis(pd.MultiIndex.from_product([[c], x.columns]), axis=1)
        )

        for c in df
    ],
    axis=1
)

Output:

>>> new_df
     version_1      version_2      version_3
           Rec POSi       Rec POSi       Rec POSi
AX           1    2       NaN  NaN         3   32
AVF1       NaN  NaN         4  454         6   12
HI         343    4       NaN  NaN       NaN  NaN

>>> new_df.columns
MultiIndex([('version_1',  'Rec'),
            ('version_1', 'POSi'),
            ('version_2',  'Rec'),
            ('version_2', 'POSi'),
            ('version_3',  'Rec'),
            ('version_3', 'POSi')],
           )

Compact version of the above:

new_df = pd.concat([pd.json_normalize(df[c].apply(lambda x: dict([x.split('=') for x in x]))).set_index(df.index).pipe(lambda x: x.set_axis(pd.MultiIndex.from_product([[c], x.columns]), axis=1)) for c in df], axis=1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 richardec