'Converting a pandas dataframe to multi-index and changing values
I have the following dataframe:
d = [{'AX':['Rec=1','POSi=2'], 'AVF1':[], 'HI':['Rec=343', 'POSi=4'], 'version_1':[]},
{'AX':[], 'AVF1':['Rec=4', 'POSi=454'], 'HI':[], 'version_2':[]},
{'AX':['Rec=3', 'POSi=32'], 'AVF1':['Rec=6', 'POSi=12'], 'HI':[], 'version_3':[]}]
frame = pd.DataFrame(d)
frame = frame.melt(cols).dropna().drop('value', axis=1).set_index('variable').T
Current output:
variable version_1 version_2 version_3
AX [Rec=1, POSi=2] [] [Rec=3, POSi=32]
AVF1 [] [Rec=4, POSi=454] [Rec=6, POSi=12]
HI [Rec=343, POSi=4] [] []
I am trying to move the version_1, version_2 etc to a higher level and add part of the values as headers like so:
version_1 version_2 version_3
Rec POSi Rec Posi Rec POSi
AX 1 2 NAN NAN 3 32
AVF1 NAN NAN 4 454 6 12
HI 343 4 NAN NAN NAN NAN
The main data frame I am working on has a very large number of columns so it's not possible to manually type out all the variables. I would upload but my internet is pretty slow.
I have tried:
frame.columns = [['version_1', 'version_2','version_3'], frame.columns]
frame.columns.set_levels(['Rec','POSi','c'], level=1, inplace=True)
But this does not work.
Any ideas would be great
Solution 1:[1]
This can be done, in one line, with a combination of pd.json_normalize
, set_index
, set_axis
, MultiIndex.from_product
, and pd.concat
:
new_df = pd.concat([
pd.json_normalize(
df[c].apply(
lambda x:
dict([x.split('=') for x in x])
)
)
.set_index(df.index)
.pipe(
lambda x:
x.set_axis(pd.MultiIndex.from_product([[c], x.columns]), axis=1)
)
for c in df
],
axis=1
)
Output:
>>> new_df
version_1 version_2 version_3
Rec POSi Rec POSi Rec POSi
AX 1 2 NaN NaN 3 32
AVF1 NaN NaN 4 454 6 12
HI 343 4 NaN NaN NaN NaN
>>> new_df.columns
MultiIndex([('version_1', 'Rec'),
('version_1', 'POSi'),
('version_2', 'Rec'),
('version_2', 'POSi'),
('version_3', 'Rec'),
('version_3', 'POSi')],
)
Compact version of the above:
new_df = pd.concat([pd.json_normalize(df[c].apply(lambda x: dict([x.split('=') for x in x]))).set_index(df.index).pipe(lambda x: x.set_axis(pd.MultiIndex.from_product([[c], x.columns]), axis=1)) for c in df], axis=1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | richardec |