'How to convert the dummy variable columns in to several columns?
I know how to unstack rows into columns, but how to deal with the following dataframe?
| date | dummy | avg | lable |
|---|---|---|---|
| 1-19 | 1 | 20 | l1 |
| 1-19 | 0 | 40 | l1 |
| 1-27 | 1 | 100 | l2 |
| 1-27 | 0 | 140 | l2 |
Expected dataframe:
| date | avg_t | avg_c | lable |
|---|---|---|---|
| 1-19 | 20 | 40 | l1 |
| 1-27 | 100 | 140 | l2 |
The avg is 20 when dummy equals 1 and it is renamed to avg_t as a column. Similar to the column avg_c.
I tried:
df.groupby(['dummy','avg']).size().unstack
But, It does not work.
Solution 1:[1]
Create new column by DataFrame.assign with Series.map and then use DataFrame.pivot with DataFrame.add_prefix:
df = (df.assign(new = df['dummy'].map({0:'c', 1:'t'}))
.pivot(['date','lable'], 'new', 'avg')
.add_prefix('avg_')
.reset_index()
.rename_axis(None, axis=1))
print (df)
date lable avg_c avg_t
0 1-19 l1 40 20
1 1-27 l2 140 100
Solution 2:[2]
You could also use pivot_wider from janitor:.
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_wider(index = ['date', 'lable'], names_from = 'dummy')
Out[19]:
date lable avg_0 avg_1
0 1-19 l1 40 20
1 1-27 l2 140 100
if you want the t and c:
(df.assign(dummy = df['dummy'].map({0:'c', 1:'t'})).
pivot_wider(index = ['date', 'lable'], names_from = 'dummy'))
Out[20]:
date lable avg_c avg_t
0 1-19 l1 40 20
1 1-27 l2 140 100
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | sammywemmy |
