'Capture the first three unique values from each row in a pandas dataframe
I have a pandas dataframe like below:
pd.DataFrame({'col1': ['A', 'C'],
'col2': ['A', 'B'],
'col3': ['B', 'B'],
'col4': ['A', 'C'],
'col5': ['C', 'F'],
'col6': ['D', 'D'],
'col7': ['E', 'G'],
'col8': ['E', 'H'] })
col1 col2 col3 col4 col5 col6 col7 col8
A A B A C D E E
C B B C F D G H
I need to generate another dataframe where each row is the first three unique values of each row from previous dataframe.
so this is what I need.
fea1 fea2 fea3
A B C
C B F
I spent hours and was not able to find a solution. Does anyone know how to achieve that. Thanks a lot in advance.
Solution 1:[1]
In your case do unique
df = df.apply(lambda x : pd.Series(x.unique()[:3]),axis=1)
Out[96]:
0 1 2
0 A B C
1 C B F
Solution 2:[2]
From a long testing queue
pd.DataFrame(df.agg(lambda x: x.unique()[:3], axis=1).to_list(), columns=['fea1' ,'fea2' , 'fea3'])
fea1 fea2 fea3
0 A B C
1 C B F
Solution 3:[3]
Another option is to drop_duplicates (works only if every row has at least 3 unique values):
out = df.apply(lambda x: x.drop_duplicates().to_numpy()[:3], axis=1, result_type='expand')
For general case:
out = pd.DataFrame(df.apply(lambda x: x.drop_duplicates().to_numpy()[:3], axis=1).tolist())
Output:
0 1 2
0 A B C
1 C B F
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | BENY |
| Solution 2 | wwnde |
| Solution 3 |
