'How to sort by multiple columns for all values and not just duplicates - python
I have a pandas dataframe for which I need to sort (by ascending) the values by two columns with the output being a "middle ground" of the two columns.
An example is shown bellow. When I use sort_values it sorts by the first columns and considers the second one only for duplicate values. I, however, need to get the row that have the combinaison of lower values for both columns (which is the 3rd one in the ouput bellow).
test = pd.DataFrame({'file':[1,2,3,4,5,6], 'rmse':[66,41,43,39,40,42], 'var':[44,177,201,321,349,379]})
test.sort_values(by=['rmse', 'var'], ascending=[True, True])
Output :
file rmse var
3 4 39 321 <--- First row given by `sort_values`
4 5 40 349
1 2 41 177 <--- Row that I need
5 6 42 379
2 3 43 201
0 1 66 44
I'm not sure how to phrase my question properly in English so please tell me if I need to make my question more clear.
Solution 1:[1]
IIUC, let's use rank, mean, and argsort:
test.iloc[test[['var', 'rmse']].rank().mean(axis=1).argsort()]
Output:
file rmse var
1 2 41 177
3 4 39 321
0 1 66 44
4 5 40 349
2 3 43 201
5 6 42 379
Details, rank the values in each column, then average the ranks for each row and sort the mean ranks to determine row order.
Solution 2:[2]
I've tried all the methods of df.sort.values but instead of that you can try a for loop like this :
import pandas as pd
test = pd.DataFrame({'file':[1,2,3,4,5,6], 'rmse':[66,41,43,39,40,42], 'var':[44,177,201,321,349,379]})
for i in test:
test[i]=sorted(test[i])
print(test)
Output :
file rmse var
0 1 39 44
1 2 40 177
2 3 41 201
3 4 42 321
4 5 43 349
5 6 66 379
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Scott Boston |
| Solution 2 |
