'Columns names of top n values of each row in a pandas DataFrame
Data:
d = {'a': [1,5], 'b': [2,4], 'c': [3,3], 'd':[4,2], 'e': [5,1]}
df = pd.DataFrame(d)
Desired Output:
d2 = {'a': [1,5], 'b': [2,4], 'c': [3,3], 'd':[4,2], 'e': [5,1], 'Top (First)': ['e','a'], 'Top (Second)': ['d','b'], 'Top (Third)': ['c','c']}
df2 = pd.DataFrame(d2)
I would like to create 3 columns - Top (First), Top (Second), Top (Third) that return the column names of the top 3 highest values for that row. I tried an np.where and even though that works, it becomes exponentially longer with more columns so looking for a simpler solution.
Solution 1:[1]
You could use nlargest to find the 3 largest values, then get the index of the largest values (which are column names since we apply nlargest row-wise) and build DataFrame and join it back to df:
df2 = df.join(pd.DataFrame(df.apply(lambda x: x.nlargest(3).index.tolist(), axis=1).tolist(),
columns=['Top (First)','Top (Second)','Top (Third)']))
Output:
a b c d e Top (First) Top (Second) Top (Third)
0 1 2 3 4 5 e d c
1 5 4 3 2 1 a b c
Solution 2:[2]
enke's answer is good enough solving the question within one library. Here's how you do with Numpy if you are feeling spicy lol.
import numpy as np
pd.concat([df,pd.DataFrame(df.apply(lambda x:list(df.columns[np.array(x).argsort()[::-1][:3]]), axis=1).to_list(), columns=['Top (First)', 'Top (Second)', 'Top (Third)'])], axis=1)
a b c d e Top (First) Top (Second) Top (Third)
0 1 2 3 4 5 e d c
1 5 4 3 2 1 a b c
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | hteza |
