'fill a column with its most frequent
I want to fill a column 'col2' with its most frequent value grouped by some other column.However, it should not affect other columns of the dataframe.
import pandas as pd
d = {'col1': ['green','green','green','blue','blue','blue'],'col2': ['gx','gx','ow','nb','nb','mj'],'col3': ['omg','omg','omg','qwe','qwe','omg'],'col4':['s','u','s','s','u','u']}
dftest = pd.DataFrame(data=d)
dftest
I ran below code which is working for col1 and col2 but no idea how to keep other columns intact.
dftest = dftest.groupby('col1')['col2'].apply(lambda x: x.value_counts().index[0]).reset_index()
Expected dataframe:
| col1 | col2 | col3 | col4 |
|---|---|---|---|
| green | gx | omg | s |
| green | gx | omg | u |
| green | gx | omg | s |
| blue | gx | qwe | s |
| blue | gx | qwe | u |
| blue | gx | omg | u |
Solution 1:[1]
Your expected output appears slightly off since blue has a different most seen string in the original DataFrame. The following code should get you the desired output.
dftest.assign(
col2=dftest.groupby("col1", as_index=False)["col2"].transform(
lambda x: x.value_counts().idxmax()
)
)
col1 col2 col3 col4
0 green gx omg s
1 green gx omg u
2 green gx omg s
3 blue nb qwe s
4 blue nb qwe u
5 blue nb omg u
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | gold_cy |
