'Creating a subset of data frame using condition in python and pandas

suppose I have this data frame I want to create a subset of it based on the conditions below.

df=pd.DataFrame({'file':[1205,2897,1205,1205,4312,1322,1242,52,2897,111],
                         'department':[finance,finance,IT,marketing,marketing,IT,finance,IT,marketing,IT],
                         'status':[1,1,1,1,1,1,1,1,1,1],
                         })

   file department  status
0   1205    finance       1
1   2897    finance       1
2   1205       IT         1
3   1205    marketing     1
4   4312    marketing     1
5   1322       IT         1
6   1242    finance       1
7   52         IT         1
8   2897    marketing     1
9   111        IT         1

if the file exist in finance and exist in IT delete it from finance and keep it in IT
if the file exist in finance and marketing and IT REMOVE FROM FIRST 2 AND KEEP IT IN IT
if the file exist in finance and marketing delete from the first and keep it in the marketing
if the file exist in marketing and IT delete from the
first and keep it in the IT

THE EXPECTED RESULT :

   file department  status
0   1205       IT         1
1   2897    marketing     1
2   4312    marketing     1
3   1322       IT         1
4   1242    finance       1
5   52         IT         1
6   111        IT         1

Solution 1:^[1]

Use CategoricalDtype to create an ordered collection such as 'finance' < 'marketing' < 'IT':

cat = pd.CategoricalDtype(['finance', 'marketing', 'IT'], ordered=True)
out = (df.astype({'department': cat}).sort_values('department')
         .drop_duplicates('file', keep='last').sort_index())
print(out)

# Output
   file department  status
2  1205         IT       1
4  4312  marketing       1
5  1322         IT       1
6  1242    finance       1
7    52         IT       1
8  2897  marketing       1
9   111         IT       1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Corralien

'Creating a subset of data frame using condition in python and pandas

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]