'Replace the value of a column which is close match to another column in pandas
There is a dataframe as follows:
ID Mat_Des Matched_Des Price Score
1 4-STROKE 25HP OB MOTOR FOR GEMINI 4- STROKE 25 HP OB MOTER FOR GEMNI 10000 100
2 OBS for 25HP OBM STANDARD TOOL KIT 5000 94
3 Accessories for 25HP OBM SERVICE ENGINEERING 5000 54
4 Standard Tool Kit for 25HP OBM PBS DOCUMENTATION 1000
5 OWNER’S MANUAL (IN ENGLISH)
The Score
is derivation from a fuzzy matching logic which matches Mat_Des
and Matched_Des
using fuzz.partial_ratio
and set threshold to 85.
I want a resultant dataframe where Matched_Des
column would be dropped. But the Price
would be plotted accordingly. So the resultant dataframe would be
ID Mat_Des Price
1 4-STROKE 25HP OB MOTOR FOR GEMINI 10000
2 OBS for 25HP OBM 5000
3 Accessories for 25HP OBM 0
4 Standard Tool Kit for 25HP OBM 5000
5 OWNER’S MANUAL (IN ENGLISH) 0
Please note that for Mat_Des
"Standard Tool Kit for 25HP OBM" the Price
is plotted as 5000. Because it was for "Standard Tool Kit" under Matched_Des
.
To start I want to use a similar approach like:
df['Mat_Des'] = np.where(df['Score']>85, df['Matched_Des'],df['Mat_Des'])
But the above approach would replace OBS for 25HP OBM
by STANDARD TOOL KIT
.
Any clue on how to address this?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|