'Custoom pandas wide to long

I have this type of wide dataset which I want to convert to long format. The column choice shows the order of the chosen alternative. So, if it is choice2 it means the second option was chosen and that column should have 1 for the second option and 0 for the others. The columns college, hsg2 coml5 should repeat.

Data:

respondent  choice  college hsg2    coml5   type1   type2   type3   type4   type5   type6   fuel1   fuel2   fuel3   fuel4   fuel5   fuel6
1   choice1 0   0   0   van regcar  van stwagon van truck   cng cng electric    electric    gasoline    gasoline
2   choice2 1   1   1   regcar  van regcar  stwagon regcar  truck   methanol    methanol    cng cng gasoline    gasoline

This is what I am trying to achieve. I could not find any ways to convert the choice column properly. Any help is appreciated. I looked at the most of the wide to long and vice versa questions, but could not solve my problem.

enter image description here



Solution 1:[1]

One option is pivot_longer from pyjanitor; pivot_longer transforms the data to long form:

# pip install pyjanitor
import pandas as pd
import janitor

(df
.pivot_longer(
     index=slice("respondent", "coml5"),
     names_to=(".value", "num"),
     names_pattern=r"(.+)(.)",
     sort_by_appearance=True)
.assign(choice=lambda df: df.choice.str[-1].eq(df.num).astype(int))
.drop(columns="num")
)

    respondent  choice  college  hsg2  coml5     type      fuel
0            1       1        0     0      0      van       cng
1            1       0        0     0      0   regcar       cng
2            1       0        0     0      0      van  electric
3            1       0        0     0      0  stwagon  electric
4            1       0        0     0      0      van  gasoline
5            1       0        0     0      0    truck  gasoline
6            2       0        1     1      1   regcar  methanol
7            2       1        1     1      1      van  methanol
8            2       0        1     1      1   regcar       cng
9            2       0        1     1      1  stwagon       cng
10           2       0        1     1      1   regcar  gasoline
11           2       0        1     1      1    truck  gasoline

The .value determines which part of the column name remains as a header. In your data the pattern is type followed by a number or fuel followed by a number. .value pairs with type/fuel and stays as a header, the remaining part(the number) goes into the num column. This pattern is what is captured in names_pattern as a regular expression of groups.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 sammywemmy