'Python unpivot dataframe with duplicate column names
I have a huge excel file with data like this:
DATE OIL GAS WATER OIL GAS WATER
01-01-2018 100 200 300 10 20 30
01-02-2018 200 500 300 30 40 10
01-03-2018 10 50 30 20 40 10
...
How can I transform it to below format using python:
DATE OIL GAS WATER
01-01-2018 100 200 300
01-02-2018 200 500 300
01-03-2018 10 50 30
01-01-2018 10 20 30
01-02-2018 30 40 10
01-03-2018 20 40 10
Solution 1:[1]
You can use pd.melt(Dataframe,...) and concat df(Dataframe) columns, consecutively :
df1 = pd.melt(df,id_vars=['Date'],value_vars=['Oil','Oil.1'], value_name='Oil')
df2 = pd.melt(df,id_vars=['Date'],value_vars=['Gas','Gas.1'], value_name='Gas')
df3 = pd.melt(df,id_vars=['Date'],value_vars=['Water','Water.1'], value_name='Water')
pd.concat([df1.Date,df1.Oil,df2.Gas,df3.Water],axis=1)
Solution 2:[2]
One option is with pivot_longer from pyjanitor, using the .value placeholder:
# install from dev
# pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor
df.pivot_longer(index = 'DATE', names_to = '.value', names_pattern = r'(.+)')
DATE OIL GAS WATER
0 01-01-2018 100 200 300
1 01-02-2018 200 500 300
2 01-03-2018 10 50 30
3 01-01-2018 10 20 30
4 01-02-2018 30 40 10
5 01-03-2018 20 40 10
The .value simply identifies which part of the columns stay as headers; in this case it aggregates the similar columns into their own unique columns.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | sammywemmy |
