'Collapse multiple rows in pandas dataframe and convert data from long to wide afterwards
I have a dataframe with repetitive names of people
Input:
{'name': {0: 'John Smith', 1: 'John Smith', 2: 'John Smith', 3: 'John Doo', 4: 'John Doo', 5: 'John Doo'}, 'journal': {0: 'Journal1', 1: 'Journal2', 2: 'Journal2', 3: 'Journal1', 4: 'Journal2', 5: 'Journal2'}, 'is_editor_in_chief_2019': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editor_in_chief_2020': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editor_in_chief_2021': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editor_in_chief_2022': {0: 1.0, 1: nan, 2: nan, 3: 1.0, 4: nan, 5: nan}, 'is_editorial_board_member_2019': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editorial_board_member_2020': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editorial_board_member_2021': {0: nan, 1: nan, 2: 1.0, 3: nan, 4: nan, 5: 1.0}, 'is_editorial_board_member_2022': {0: nan, 1: 1.0, 2: nan, 3: nan, 4: 1.0, 5: nan}}
I want to collapse these rows and convert data from long to wide.
Expected output:
{'name': {0: 'John Smith', 1: 'John Doo'}, 'Journal1_is_editor_in_chief_2019': {0: nan, 1: nan}, 'Journal1_is_editor_in_chief_2020': {0: nan, 1: nan}, 'Journal1_is_editor_in_chief_2021': {0: nan, 1: nan}, 'Journal1_is_editor_in_chief_2022': {0: 1, 1: 1}, 'Journal1_is_editorial_board_member_2019': {0: nan, 1: nan}, 'Journal1_is_editorial_board_member_2020': {0: nan, 1: nan}, 'Journal1_is_editorial_board_member_2021': {0: 1, 1: 1}, 'Journal1_is_editorial_board_member_2022': {0: 1, 1: 1}, 'Journal2_is_editor_in_chief_2019': {0: nan, 1: nan}, 'Journal2_is_editor_in_chief_2020': {0: nan, 1: nan}, 'Journal2_is_editor_in_chief_2021': {0: nan, 1: nan}, 'Journal2_is_editor_in_chief_2022': {0: nan, 1: nan}, 'Journal2_is_editorial_board_member_2019': {0: nan, 1: nan}, 'Journal2_is_editorial_board_member_2020': {0: nan, 1: nan}, 'Journal2_is_editorial_board_member_2021': {0: nan, 1: nan}, 'Journal2_is_editorial_board_member_2022': {0: nan, 1: nan}}
The closest solution I found is here:
df_out = df.set_index(['name', df.groupby(['name']).cumcount()+1]).unstack().sort_index(level=1, axis=1)
df_out.columns = df_out.columns.map('{0[0]}_{0[1]}'.format)
df_out.reset_index()
But it does not give me the ideal output.
Solution 1:[1]
You can use pivot_table
:
df_out = df.pivot_table(index='name', columns='journal', dropna=False)
df_out.columns = df_out.columns.map('{0[1]}_{0[0]}'.format)
df_out.reset_index()
output:
name Journal1_is_editor_in_chief_2019 Journal2_is_editor_in_chief_2019 Journal1_is_editor_in_chief_2020 Journal2_is_editor_in_chief_2020 Journal1_is_editor_in_chief_2021 Journal2_is_editor_in_chief_2021 Journal1_is_editor_in_chief_2022 Journal2_is_editor_in_chief_2022 Journal1_is_editorial_board_member_2019 Journal2_is_editorial_board_member_2019 Journal1_is_editorial_board_member_2020 Journal2_is_editorial_board_member_2020 Journal1_is_editorial_board_member_2021 Journal2_is_editorial_board_member_2021 Journal1_is_editorial_board_member_2022 Journal2_is_editorial_board_member_2022
0 John Doo NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0
1 John Smith NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN 1.0 NaN 1.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | mozway |