'Python: Merge 2 dataframes using 3 matching columns using Fuzzy logic
I have 2 excel sheets A and B, Sheet A has Column A has Product name, dose type, column B with Size and Column C with Country Sheet B has Column A with Product, dose type, Size, country abbreviation
Sheet 1 Columns:
name size Country
Brand Actified 100 mg/30 mg syrup 21 France
[Df1 columns][1]
Sheet 2 Column:
Clubbed field
BRANDACTI 100mg/30mg 21 FR
df2 common field with Product, size and country abbreviation This is just an direct example, but the data is not consistent to map in both the tables, either some values missing or values are in a different format.
Solution i tried: Fuzzy matched each column separetely and combined all 3 columns into one. Cropped Product name as separate column, existing column with Size, Cropped column with country Code
But issue is, merging all 3 columns is giving more combination of values with high threshold, as the string length is big. Ex: Both dataframes has 4 matching rows, but has small difference in size(just size unit difference say 2 or 4 units), combining is giving 16 rows, as the threshold is high for all combinations, as the only small difference in the big text is number.
Column merge using Fuzzy logic
Is there a way i can merge the 2 dataframes based on datas matching all 3 columns with fuzzy score?
In my case: Combine dataframes based on values matching all 3 columns: Name, Size and Country with fuzzy threshold score
What is the best possible solution for this?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
