'Collapse multiple rows in pandas dataframe and convert data from long to wide afterwards

I have a dataframe with repetitive names of people

Input:

{'name': {0: 'John Smith', 1: 'John Smith', 2: 'John Smith', 3: 'John Doo', 4: 'John Doo', 5: 'John Doo'}, 'journal': {0: 'Journal1', 1: 'Journal2', 2: 'Journal2', 3: 'Journal1', 4: 'Journal2', 5: 'Journal2'}, 'is_editor_in_chief_2019': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editor_in_chief_2020': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editor_in_chief_2021': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editor_in_chief_2022': {0: 1.0, 1: nan, 2: nan, 3: 1.0, 4: nan, 5: nan}, 'is_editorial_board_member_2019': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editorial_board_member_2020': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editorial_board_member_2021': {0: nan, 1: nan, 2: 1.0, 3: nan, 4: nan, 5: 1.0}, 'is_editorial_board_member_2022': {0: nan, 1: 1.0, 2: nan, 3: nan, 4: 1.0, 5: nan}}

I want to collapse these rows and convert data from long to wide.

Expected output:

{'name': {0: 'John Smith', 1: 'John Doo'}, 'Journal1_is_editor_in_chief_2019': {0: nan, 1: nan}, 'Journal1_is_editor_in_chief_2020': {0: nan, 1: nan}, 'Journal1_is_editor_in_chief_2021': {0: nan, 1: nan}, 'Journal1_is_editor_in_chief_2022': {0: 1, 1: 1}, 'Journal1_is_editorial_board_member_2019': {0: nan, 1: nan}, 'Journal1_is_editorial_board_member_2020': {0: nan, 1: nan}, 'Journal1_is_editorial_board_member_2021': {0: 1, 1: 1}, 'Journal1_is_editorial_board_member_2022': {0: 1, 1: 1}, 'Journal2_is_editor_in_chief_2019': {0: nan, 1: nan}, 'Journal2_is_editor_in_chief_2020': {0: nan, 1: nan}, 'Journal2_is_editor_in_chief_2021': {0: nan, 1: nan}, 'Journal2_is_editor_in_chief_2022': {0: nan, 1: nan}, 'Journal2_is_editorial_board_member_2019': {0: nan, 1: nan}, 'Journal2_is_editorial_board_member_2020': {0: nan, 1: nan}, 'Journal2_is_editorial_board_member_2021': {0: nan, 1: nan}, 'Journal2_is_editorial_board_member_2022': {0: nan, 1: nan}}

The closest solution I found is here:

df_out = df.set_index(['name', df.groupby(['name']).cumcount()+1]).unstack().sort_index(level=1, axis=1)
df_out.columns = df_out.columns.map('{0[0]}_{0[1]}'.format)
df_out.reset_index()

But it does not give me the ideal output.



Solution 1:[1]

You can use pivot_table:

df_out = df.pivot_table(index='name', columns='journal', dropna=False)
df_out.columns = df_out.columns.map('{0[1]}_{0[0]}'.format)
df_out.reset_index()

output:

         name  Journal1_is_editor_in_chief_2019  Journal2_is_editor_in_chief_2019  Journal1_is_editor_in_chief_2020  Journal2_is_editor_in_chief_2020  Journal1_is_editor_in_chief_2021  Journal2_is_editor_in_chief_2021  Journal1_is_editor_in_chief_2022  Journal2_is_editor_in_chief_2022  Journal1_is_editorial_board_member_2019  Journal2_is_editorial_board_member_2019  Journal1_is_editorial_board_member_2020  Journal2_is_editorial_board_member_2020  Journal1_is_editorial_board_member_2021  Journal2_is_editorial_board_member_2021  Journal1_is_editorial_board_member_2022  Journal2_is_editorial_board_member_2022
0    John Doo                               NaN                               NaN                               NaN                               NaN                               NaN                               NaN                               1.0                               NaN                                      NaN                                      NaN                                      NaN                                      NaN                                      NaN                                      1.0                                      NaN                                      1.0
1  John Smith                               NaN                               NaN                               NaN                               NaN                               NaN                               NaN                               1.0                               NaN                                      NaN                                      NaN                                      NaN                                      NaN                                      NaN                                      1.0                                      NaN                                      1.0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mozway