'How to extend multilevel columns in Pandas
Given a df with two level columns
a
E1_g1 E1_g2 E1_g3 E2_g1 E2_g2 E2_g3 E3_g1 E3_g2 E3_g3
0 4 0 3 3 3 1 3 2 4
1 0 0 4 2 1 0 1 1 0
From a list tuple
[('a', 'E1', 'g1'), ('a', 'E1', 'g2'), ('a', 'E1', 'g3'), ('a', 'E2', 'g1'), ('a', 'E2', 'g2'), ('a', 'E2', 'g3'), ('a', 'E3', 'g1'), ('a', 'E3', 'g2'), ('a', 'E3', 'g3')]
The list of tuple is generated from for-loop shown in accompanying code below.
I would like to expand it into 3 level from the given list tuple
a
E1 E2 E3
g1 g2 g3 g1 g2 g3 g1 g2 g3
0 4 0 3 3 3 1 3 2 4
1 0 0 4 2 1 0 1 1 0
I have the impression this simply can be achieve via
df.colums=pd.MultiIndex.from_tuples(ntuple)
However,applying the above produce
a
E1_g1 E1_g2 E1_g3 E2_g1 E2_g2 E2_g3 E3_g1 E3_g2 E3_g3
0 4 0 3 3 3 1 3 2 4
1 0 0 4 2 1 0 1 1 0
May I know what am I missing here?
The full code to reproduce the above is below
import numpy as np
import pandas as pd
np.random.seed(0)
arr = np.random.randint(5, size=(2, 9))
_names = ['a','a','a','a','a','a','a','a','a']
_idx = ['E1_g1','E1_g2','E1_g3',
'E2_g1','E2_g2','E2_g3',
'E3_g1','E3_g2','E3_g3']
columns = pd.MultiIndex.from_arrays([_names, _idx])
df= pd.DataFrame(data=arr, columns=columns)
ntuple=[]
for dg in df.columns:
A,B=dg
f,r=B.split('_')
ntuple.append((A,f,r))
df.colums=pd.MultiIndex.from_tuples(ntuple)
print(df)
Solution 1:[1]
If you already have the list of tuples, you can use pd.MultiIndex.from_tuples:
tuples = [('a', 'E1', 'g1'), ('a', 'E1', 'g2'), ('a', 'E1', 'g3'), ('a', 'E2', 'g1'), ('a', 'E2', 'g2'), ('a', 'E2', 'g3'), ('a', 'E3', 'g1'), ('a', 'E3', 'g2'), ('a', 'E3', 'g3')]
df.columns = pd.MultiIndex.from_tuples(tuples)
Output:
a
E1 E2 E3
g1 g2 g3 g1 g2 g3 g1 g2 g3
0 4 0 3 3 3 1 3 2 4
1 0 0 4 2 1 0 1 1 0
Full code:
import numpy as np
import pandas as pd
np.random.seed(0)
arr = np.random.randint(5, size=(2, 9))
tuples = [('a', 'E1', 'g1'), ('a', 'E1', 'g2'), ('a', 'E1', 'g3'), ('a', 'E2', 'g1'), ('a', 'E2', 'g2'), ('a', 'E2', 'g3'), ('a', 'E3', 'g1'), ('a', 'E3', 'g2'), ('a', 'E3', 'g3')]
df = pd.DataFrame(data=arr)
df.columns = pd.MultiIndex.from_tuples(tuples)
For using _names and _idx:
df.columns = pd.MultiIndex.from_tuples([[name]+idx.split('_') for name,idx in zip(_names,_idx)])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
