'Python: Merge 2 dataframes using 3 matching columns using Fuzzy logic

I have 2 excel sheets A and B, Sheet A has Column A has Product name, dose type, column B with Size and Column C with Country Sheet B has Column A with Product, dose type, Size, country abbreviation

Sheet 1 Columns:

name                                    size        Country
Brand Actified 100 mg/30 mg syrup        21          France

[Df1 columns][1]

Sheet 2 Column:

Clubbed field
BRANDACTI 100mg/30mg 21 FR

df2 common field with Product, size and country abbreviation This is just an direct example, but the data is not consistent to map in both the tables, either some values missing or values are in a different format.

Solution i tried: Fuzzy matched each column separetely and combined all 3 columns into one. Cropped Product name as separate column, existing column with Size, Cropped column with country Code

But issue is, merging all 3 columns is giving more combination of values with high threshold, as the string length is big. Ex: Both dataframes has 4 matching rows, but has small difference in size(just size unit difference say 2 or 4 units), combining is giving 16 rows, as the threshold is high for all combinations, as the only small difference in the big text is number.

Column merge using Fuzzy logic

Is there a way i can merge the 2 dataframes based on datas matching all 3 columns with fuzzy score?

In my case: Combine dataframes based on values matching all 3 columns: Name, Size and Country with fuzzy threshold score

What is the best possible solution for this?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source