'Creating a subset of data frame using condition in python and pandas

suppose I have this data frame I want to create a subset of it based on the conditions below.

df=pd.DataFrame({'file':[1205,2897,1205,1205,4312,1322,1242,52,2897,111],
                         'department':[finance,finance,IT,marketing,marketing,IT,finance,IT,marketing,IT],
                         'status':[1,1,1,1,1,1,1,1,1,1],
                         })
   file department  status
0   1205    finance       1
1   2897    finance       1
2   1205       IT         1
3   1205    marketing     1
4   4312    marketing     1
5   1322       IT         1
6   1242    finance       1
7   52         IT         1
8   2897    marketing     1
9   111        IT         1
  • if the file exist in finance and exist in IT delete it from finance and keep it in IT
  • if the file exist in finance and marketing and IT REMOVE FROM FIRST 2 AND KEEP IT IN IT
  • if the file exist in finance and marketing delete from the first and keep it in the marketing
  • if the file exist in marketing and IT delete from the
    first and keep it in the IT

THE EXPECTED RESULT :

   file department  status
0   1205       IT         1
1   2897    marketing     1
2   4312    marketing     1
3   1322       IT         1
4   1242    finance       1
5   52         IT         1
6   111        IT         1


Solution 1:[1]

Use CategoricalDtype to create an ordered collection such as 'finance' < 'marketing' < 'IT':

cat = pd.CategoricalDtype(['finance', 'marketing', 'IT'], ordered=True)
out = (df.astype({'department': cat}).sort_values('department')
         .drop_duplicates('file', keep='last').sort_index())
print(out)

# Output
   file department  status
2  1205         IT       1
4  4312  marketing       1
5  1322         IT       1
6  1242    finance       1
7    52         IT       1
8  2897  marketing       1
9   111         IT       1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Corralien