'Creating a subset of data frame using condition in python and pandas
suppose I have this data frame I want to create a subset of it based on the conditions below.
df=pd.DataFrame({'file':[1205,2897,1205,1205,4312,1322,1242,52,2897,111],
'department':[finance,finance,IT,marketing,marketing,IT,finance,IT,marketing,IT],
'status':[1,1,1,1,1,1,1,1,1,1],
})
file department status
0 1205 finance 1
1 2897 finance 1
2 1205 IT 1
3 1205 marketing 1
4 4312 marketing 1
5 1322 IT 1
6 1242 finance 1
7 52 IT 1
8 2897 marketing 1
9 111 IT 1
- if the file exist in finance and exist in IT delete it from finance and keep it in IT
- if the file exist in finance and marketing and IT REMOVE FROM FIRST 2 AND KEEP IT IN IT
- if the file exist in finance and marketing delete from the first and keep it in the marketing
- if the file exist in marketing and IT delete from the
first and keep it in the IT
THE EXPECTED RESULT :
file department status
0 1205 IT 1
1 2897 marketing 1
2 4312 marketing 1
3 1322 IT 1
4 1242 finance 1
5 52 IT 1
6 111 IT 1
Solution 1:[1]
Use CategoricalDtype to create an ordered collection such as 'finance' < 'marketing' < 'IT':
cat = pd.CategoricalDtype(['finance', 'marketing', 'IT'], ordered=True)
out = (df.astype({'department': cat}).sort_values('department')
.drop_duplicates('file', keep='last').sort_index())
print(out)
# Output
file department status
2 1205 IT 1
4 4312 marketing 1
5 1322 IT 1
6 1242 finance 1
7 52 IT 1
8 2897 marketing 1
9 111 IT 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Corralien |
