'Pandas function to rename certain column values based off of a boolean condition in another column
I'm trying to clean a dataset that has demographic information for my company.
There is a text column for "Race" that contains the values ['White', 'Black', 'Asian', 'Two or More Races']. There is another boolean column for "Hispanic or Latino" that is either a 0 for no or a 1 for yes.
What I need to do is replace the values in the race column to "Hispanic/Latino" if the "Hispanic or Latino" column = 1, UNLESS it's "Two or More Races" which would stay the same. Does anybody have a good solution to this? I'm relatively new with Pandas and I've tried using df.loc to solve this, but the examples I see aren't as specific as mine.
Solution 1:[1]
You can select rows using
mask = (df["Hispanic or Latino"] == 1) & (df['Race'] != 'Two or More Races')
df.loc[mask, 'Race'] = 'Hispanic/Latino'
Tested on simple example
import pandas as pd
df = pd.DataFrame({
'Race': ['White', 'Black', 'Asian', 'Two or More Races'],
"Hispanic or Latino": [0, 1, 0, 1],
})
mask = (df["Hispanic or Latino"] == 1) & (df['Race'] != 'Two or More Races')
df.loc[mask, 'Race'] = 'Hispanic/Latino'
print(df)
Result:
Race Hispanic or Latino
0 White 0
1 Hispanic/Latino 1
2 Asian 0
3 Two or More Races 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | furas |
