'Sorting Pandas dataframe by multiple conditions

I have a large dataframe (thousands of rows by hundreds of columns), a short excerpt is as the following:

data = {'Step':['', '', '', 'First', 'First', 'Second', 'Third', 'Second', 'First', 'Second', 'First', 'First', 'Second', 'Second'], 
'Stuff':['tot', 'white', 'random', 7583, 3563, 824, 521, 7658, 2045, 33, 9823, 5, 8090, 51],
'Mark':['marking', '', '', 1, 5, 5, 5, 1, 27, 27, 1, 6, 1, 9],
'A':['item_a', 100, 'st1', 142, 2, 2, 2, 100, 150, 105, 118, 118, 162, 156], 
'B':['skill', 66, 'abc', 160, 2, 130, 140, 169, 1, 2, 130, 140, 144, 127],
'C':['item', 50, 'st1', 2000, 2, 65, 2001, 1999, 1, 2, 2000, 4, 2205, 2222],
'D':['item_c', 100, 'st1', 433, 430, 150, 170, 130, 1, 2, 300, 4, 291, 606],
'E':['test', 90, 'st1', 111, 130, 5, 10, 160, 1, 2, 232, 4, 144, 113],
'F':['done', 80, 'abc', 765, 755, 5, 10, 160, 1, 2, 733, 4, 666, 500],
'G':['nd', 90, 'mag', 500, 420, 5, 10, 160, 1, 2, 300, 4, 469, 500],
'H':['prt', 100, 'st1', 999, 200, 5, 10, 160, 1, 2, 477, 4, 620, 7],
'Name':['NS', '', '', "Pat", "Lucy", "Lucy", "Lucy", "Nick", "Kirk", "Kirk", "Joe", "Nico", "Nico", "Bryan"],
'Value':[ -1, 0, 0, 0, 3, 6, 5, 0, 7, 7, 0, 6, 0, 1]}
df = pd.DataFrame(data)

I need to sort this dataframe according to the following conditions that have to be satisfied all together:

  • In the "Name" column, names that are the same are to remain grouped (e.g. there are 3 records of "Lucy" next to each other, and they cannot be moved apart)
  • For each group of names, the appearance order has to remain the one given by the "Step" column (e.g. the first appearance of "Lucy" is related to the value "First" in the "Step" column, the second to "Second" and so on)
  • All the remaining names that in the "Value" column have a value = 0, have to be moved below the others (e.g. "Pat" can be moved after the others, but not "Nico" because there are two records of "Nico" and the other one has a value = 6)
  • The first three rows cannot be moved

What I have done is to concatenate different sub-dataframes:

df_groupnames=df[df.duplicated(subset=['Name'], keep=False)]
df_nogroup = df[~df.duplicated(subset=['Name'], keep=False)]
df_nogroup_high = df_nogroup[df_nogroup["Value"] > 0 ]
df_nogroup_null = df_nogroup[df_nogroup["Value"] == 0]
# Let's concatenate these dataframes to get the sorted one
df_sorted = pd.concat([df_groupnames, df_nogroup_high, df_nogroup_null])

It works, but I wonder if there's a smarter, simpler way, and maybe faster, to obtain the same. Thank you for your attention.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source