'Python unpivot dataframe with duplicate column names

I have a huge excel file with data like this:

DATE          OIL   GAS   WATER   OIL   GAS   WATER

01-01-2018    100   200   300     10    20    30    
01-02-2018    200   500   300     30    40    10    
01-03-2018    10    50    30      20    40    10 

...

How can I transform it to below format using python:

DATE         OIL   GAS   WATER

01-01-2018   100   200   300    
01-02-2018   200   500   300    
01-03-2018   10    50    30    
01-01-2018   10    20    30    
01-02-2018   30    40    10    
01-03-2018   20    40    10


Solution 1:[1]

You can use pd.melt(Dataframe,...) and concat df(Dataframe) columns, consecutively :

df1 = pd.melt(df,id_vars=['Date'],value_vars=['Oil','Oil.1'], value_name='Oil')
df2 = pd.melt(df,id_vars=['Date'],value_vars=['Gas','Gas.1'], value_name='Gas')
df3 = pd.melt(df,id_vars=['Date'],value_vars=['Water','Water.1'], value_name='Water')

pd.concat([df1.Date,df1.Oil,df2.Gas,df3.Water],axis=1)

Solution 2:[2]

One option is with pivot_longer from pyjanitor, using the .value placeholder:

# install from dev
# pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor

df.pivot_longer(index = 'DATE', names_to = '.value', names_pattern = r'(.+)')

         DATE  OIL  GAS  WATER
0  01-01-2018  100  200    300
1  01-02-2018  200  500    300
2  01-03-2018   10   50     30
3  01-01-2018   10   20     30
4  01-02-2018   30   40     10
5  01-03-2018   20   40     10

The .value simply identifies which part of the columns stay as headers; in this case it aggregates the similar columns into their own unique columns.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 sammywemmy