'Pandas: Get top n columns based on a row values
Having a dataframe with a single row, I need to filter it into a smaller one with filtered columns based on a value in a row.
What's the most effective way?
df = pd.DataFrame({'a':[1], 'b':[10], 'c':[3], 'd':[5]})
| a | b | c | d |
|---|---|---|---|
| 1 | 10 | 3 | 5 |
For example top-3 features:
| b | c | d |
|---|---|---|
| 10 | 3 | 5 |
Solution 1:[1]
Use sorting per row and select first 3 values:
df1 = df.sort_values(0, axis=1, ascending=False).iloc[:, :3]
print (df1)
b d c
0 10 5 3
Solution with Series.nlargest:
df1 = df.iloc[0].nlargest(3).to_frame().T
print (df1)
b d c
0 10 5 3
Solution 2:[2]
You can transpose T, and use nlargest():
new = df.T.nlargest(columns = 0, n = 3).T
print(new)
b d c
0 10 5 3
Solution 3:[3]
You can use np.argsort to get the solution. This Numpy method, in the below code, gives the indices of the column values in descending order. Then slicing selects the largest n values' indices.
import pandas as pd
import numpy as np
# Your dataframe
df = pd.DataFrame({'a':[1], 'b':[10], 'c':[3], 'd':[5]})
# Pick the number n to find n largest values
nlargest = 3
# Get the order of the largest value columns by their indices
order = np.argsort(-df.values, axis=1)[:, :nlargest]
# Find the columns with the largest values
top_features = df.columns[order].tolist()[0]
# Filter the dateframe by the columns
top_features_df = df[top_features]
top_features_df
output:
b d c
0 10 5 3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | sophocles |
| Solution 3 |
