'Pivot dataframe based on two dimensions
My dataframe consists of a .txt file with surveyquestions, and crossmarks indicating the answers. It looks like this and the goal is to transform this into a wide dataframe:
| Name | A | B | C | D | E |
|---|---|---|---|---|---|
| Bob | X | ||||
| Ted | X | ||||
| Chris | X |
I managed to stack the answers by this command:
s = df.set_index('Name').stack().reset_index()
which in turn gives the dataframe the following format:
| Name | level_1 | 0 |
|---|---|---|
| Bob | A | X |
| Bob | B | |
| Bob | C | |
| Bob | D | |
| Bob | E | |
| Ted | A | |
| Ted | B | X |
... and so forth
The end product ideally needs to look like this:
Name | Q1
Bob A
Ted B
Chris B
How can this be done correctly?
Solution 1:[1]
Use melt:
out = df.melt('Name', var_name='Q1').query("value == 'X'").drop(columns='value')
print(out)
# Output
Name Q1
0 Bob A
4 Ted B
5 Chris B
Solution 2:[2]
If really this is not an issue of reading the file but only of reshaping, you could stack. This remove the NaNs by default without any selection:
(df.set_index('Name')
.mask(lambda d: d.eq('X')) # this is the important part to have NaNs
.rename_axis('Q1', axis=1)
.stack()
.reset_index()
.drop(columns=0)
)
output:
Name Q1
0 Bob A
1 Ted B
2 Chris B
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Corralien |
| Solution 2 | mozway |
