'Convert pandas dataframe from wide to long
I have a pandas.Dataframe with the following columns:
a_1 ab_1 ac_1 a_2 ab_2 ac_2
2 3 4 5 6 7
How do I convert it into the following?
a ab ac
2 3 4
5 6 7
I was trying to use pandas melt to convert from wide to long format, but not sure of the syntax.
Solution 1:[1]
You can use split for MultiIndex and then reshape by stack and last use reset_index for remove MultiIndex:
df.columns = df.columns.str.split('_', expand=True)
df = df.stack().reset_index(drop=True)
print (df)
a ab ac
0 2 3 4
1 5 6 7
df = df.stack().reset_index(level=0, drop=True)
print (df)
a ab ac
1 2 3 4
2 5 6 7
Solution 2:[2]
You can replace the columns by a multi-index and stack:
df.columns = pd.MultiIndex.from_tuples(df.columns.str.split('_').map(tuple))
df = df.stack()
Solution 3:[3]
Here is one way to do that:
Code:
df.columns = pd.MultiIndex.from_tuples(
[c.split('_') for c in df.columns], names=['col', 'row'])
df.melt().pivot(index='row', columns='col', values='value')
How?
Create a
pandas.MultiIndexfor the columns by splitting on_.meltthe data frame and thenpivoton the elements from the original column names.
Test Code:
df = pd.DataFrame(
data=[range(2, 8)],
columns='a_1 ab_1 ac_1 a_2 ab_2 ac_2'.split()
)
print(df)
df.columns = pd.MultiIndex.from_tuples(
[c.split('_') for c in df.columns], names=['col', 'row'])
print(df.melt().pivot(index='row', columns='col', values='value'))
Results:
a_1 ab_1 ac_1 a_2 ab_2 ac_2
0 2 3 4 5 6 7
col a ab ac
row
1 2 3 4
2 5 6 7
pandas < 0.20.0
If using pandas prior to 0.20.0, melt() like:
print(pd.melt(df).pivot(index='row', columns='col', values='value'))
Solution 4:[4]
You can split the dataframe into two, then rename the columns, and finally concatenate them:
cols = ['a', 'ab', 'ac']
df1 = df[["a_1", "ab_1", "ac_1"]]
df2 = df[["a_2", "ab_2", "ac_2"]]
df1.columns = cols
df2.columns = cols
df3 = pd.concat([df1, df2], ignore_index=True)
Solution 5:[5]
If you want to use pnd.melt you should probably use the value_vars and value_name parameters:
df_a = pnd.melt(df, value_vars=['a_1', 'a_2'], value_name='a')[['a']]
df_ab = pnd.melt(df, value_vars=['ab_1', 'ab_2'], value_name='ab')[['ab']]
df_ac = pnd.melt(df, value_vars=['ac_1', 'ac_2'], value_name='ac')[['ac']]
df_final = df_a.join(df_b).join(df_c)
Alternatively, taking a more functional approach:
col_prefixes = ['a', 'ab', 'ac']
df_cuts = map(lambda x: pnd.melt(df, value_vars=['%s_1' % x, '%s_2' % x], value_name=x)[[x]], col_prefixes)
df_final = reduce(lambda x, y: x.join(y), df_cuts)
Solution 6:[6]
There is builtin function wide_to_long for more see in documentation:
In [115]: df
Out[115]:
a_1 ab_1 ac_1 a_2 ab_2 ac_2
0 2 3 4 5 6 7
In [116]: df['id'] = df.index
In [117]: df
Out[117]:
a_1 ab_1 ac_1 a_2 ab_2 ac_2 id
0 2 3 4 5 6 7 0
In [118]: pd.wide_to_long(df, ['a','ab','ac'],i='id',j='num',sep='_')
Out[118]:
a ab ac
id num
0 1 2 3 4
2 5 6 7
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jezrael |
| Solution 2 | hilberts_drinking_problem |
| Solution 3 | |
| Solution 4 | Diego Mora Cespedes |
| Solution 5 | peterfields |
| Solution 6 | shivsn |
