'Custoom pandas wide to long
I have this type of wide dataset which I want to convert to long format. The column choice shows the order of the chosen alternative. So, if it is choice2 it means the second option was chosen and that column should have 1 for the second option and 0 for the others. The columns college, hsg2 coml5 should repeat.
Data:
respondent choice college hsg2 coml5 type1 type2 type3 type4 type5 type6 fuel1 fuel2 fuel3 fuel4 fuel5 fuel6
1 choice1 0 0 0 van regcar van stwagon van truck cng cng electric electric gasoline gasoline
2 choice2 1 1 1 regcar van regcar stwagon regcar truck methanol methanol cng cng gasoline gasoline
This is what I am trying to achieve. I could not find any ways to convert the choice column properly. Any help is appreciated. I looked at the most of the wide to long and vice versa questions, but could not solve my problem.
Solution 1:[1]
One option is pivot_longer from pyjanitor; pivot_longer transforms the data to long form:
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
index=slice("respondent", "coml5"),
names_to=(".value", "num"),
names_pattern=r"(.+)(.)",
sort_by_appearance=True)
.assign(choice=lambda df: df.choice.str[-1].eq(df.num).astype(int))
.drop(columns="num")
)
respondent choice college hsg2 coml5 type fuel
0 1 1 0 0 0 van cng
1 1 0 0 0 0 regcar cng
2 1 0 0 0 0 van electric
3 1 0 0 0 0 stwagon electric
4 1 0 0 0 0 van gasoline
5 1 0 0 0 0 truck gasoline
6 2 0 1 1 1 regcar methanol
7 2 1 1 1 1 van methanol
8 2 0 1 1 1 regcar cng
9 2 0 1 1 1 stwagon cng
10 2 0 1 1 1 regcar gasoline
11 2 0 1 1 1 truck gasoline
The .value determines which part of the column name remains as a header. In your data the pattern is type followed by a number or fuel followed by a number. .value pairs with type/fuel and stays as a header, the remaining part(the number) goes into the num column. This pattern is what is captured in names_pattern as a regular expression of groups.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | sammywemmy |

