'how to move column values to be titles in pandas
I have been working on a project. my issue is I have repeated column values that I would like to have titles. but nothing seems to be working
so for perspective,running the code below i want to do away with the outage column. and instead have headers of 'Forced '[tot_diff,count,seconds]. same for Planned and Urgent.. bearing in mind that there are multiple entries in Asset Name column.. Thanks
import pandas as pd
import numpy as np
df = pd.DataFrame({"Asset Name":
list('abcdefghijklmnopqrabcdefghijklmnopqr'),
"Outage Type":
["Forced","Planned","Urgent"]*12,
"tot_diff": np.random.randint(0, 3, 36),
"count": np.random.randint(0, 20, 36),
"seconds": np.random.randint(0, 1000, 36)})
df2 = df.pivot(index="Asset Name", columns="Outage Type")
Solution 1:[1]
Something like this?
import pandas as pd
import numpy as np
df = pd.DataFrame({"Asset Name": list('abcdefghijklmnopqr'),
"Outage Type": ["Forced","Planned","Urgent"]*6,
"tot_diff": np.random.randint(0, 3, 18),
"count": np.random.randint(0, 20, 18),
"seconds": np.random.randint(0, 1000, 18)})
df2 = df.pivot(index="Asset Name", columns="Outage Type")
df is the input, similar to your dataframe, and df2 is the output, with separate columns for the different Outage Type.
EDIT
Given that you have changed your question since I answered, so that there are now duplicate Asset Name values, the above no longer works for you.
Taking the same format as the image in your question:
df = pd.DataFrame({"Asset Name":
list('aaabbbcccdddeeefffaaabbbcccdddeeefff'),
"Outage Type":
["Forced","Planned","Urgent"]*12,
"tot_diff": np.random.randint(0, 3, 36),
"count": np.random.randint(0, 20, 36),
"seconds": np.random.randint(0, 1000, 36)})
If there are multiple asset name values, you will need to do one of the following:
Only convert your "Outage Type" values to columns:
df2 = df.pivot(columns="Outage Type")
df2.head()
#Out[23]:
# Asset Name tot_diff ... count seconds
#Outage Type Forced Planned Urgent Forced ... Urgent Forced Planned Urgent
#0 a NaN NaN 1.0 ... NaN 823.0 NaN NaN
#1 NaN a NaN NaN ... NaN NaN 858.0 NaN
#2 NaN NaN a NaN ... 19.0 NaN NaN 719.0
#3 b NaN NaN 1.0 ... NaN 605.0 NaN NaN
#4 NaN b NaN NaN ... NaN NaN 950.0 NaN
#[5 rows x 12 columns]
- This looks messy, because there are so many
nanvalues from all the additional columns.
Or choose a method of aggregation to the duplicate rows (where Asset Name and Outage Type are the same:
df2 = df.pivot_table(index="Asset Name", columns="Outage Type", aggfunc='mean')
df2.head()
#Out[25]:
# count seconds ... tot_diff
#Outage Type Forced Planned Urgent Forced ... Urgent Forced Planned Urgent
#Asset Name ...
#a 13.0 11.0 18.0 693.5 ... 472.0 1.5 0.5 1.0
#b 10.0 9.5 4.5 410.5 ... 147.5 1.5 0.5 1.0
#c 5.0 3.5 13.5 152.5 ... 546.0 1.5 1.0 1.5
#d 16.5 9.5 10.5 427.5 ... 106.0 0.0 2.0 0.5
#e 10.5 6.5 13.5 371.0 ... 405.5 1.0 0.5 0.5
#[5 rows x 9 columns]
- I have chosen to take the
meanof the duplicate rows here, but there are more options available.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |

