'Flattening a dataframe [duplicate]
I'm new to Python and would appreciate if someone could please help on this:
I have a dataset that looks like this:
| Celebrity1 | Celebrity2 | Celebrity3 | Votes |
|---|---|---|---|
| Brad Pitt | Angelina Jolie | - | 49 |
| Will Smith | Chris Rock | Justin Bieber | 55 |
I want to "flatten" this database so it looks like this:
| Celebrity | Votes |
|---|---|
| Brad Pitt | 49 |
| Angelina Jolie | 49 |
| Will Smith | 55 |
| Chris Rock | 55 |
| Justin Bieber | 55 |
How can I do this in Python?
#here is the code for the initial dataframe
data = {'Celebrity1': ['Brad Pitt','Will Smith'], 'Celebrity2': ['Angelina Jolie', 'Chris Rock'],'Celebrity3': [None,'Justin Bieber'],'Votes': [49,55]}
df = pd.DataFrame(data)
Solution 1:[1]
You could use wide_to_long:
out = (pd.wide_to_long(df.reset_index(), stubnames='Celebrity', i='index', j='j')
.dropna().sort_values(by='Votes').reset_index(drop=True))
Another option is to use set_index + stack:
df = df.set_index('Votes').stack().droplevel(-1).reset_index(name='Celebrity')[['Celebrity','Votes']]
Output:
Celebrity Votes
0 Brad Pitt 49
1 Angelina Jolie 49
2 Will Smith 55
3 Chris Rock 55
4 Justin Bieber 55
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
