'how to fuzzywuzzy match items in dataframe columns a, and merge with table b elements?
Hi I have a table products, and another table product pricing. How would I use the fuzzywuzzy match so that I can find the products and return the similarity score and also add productpricing tables items?
tables examples:
| product | category |
|---|---|
| colgate toothpaste 150gram | dental |
| productPricing | Price | Description | category |
|---|---|---|---|
| toothpaste whitening colgate 150gram | usd5 | tootpaste whitening colgate | dental |
output:
| product | similarity score | productPricing | Price | Description |
|---|---|---|---|---|
| colgate toothpaste 150gram | 85 | toothpaste whitening colgate 150gram | USD 5 | White paste that can help.. |
I am using the fuzz.token_set_ratio to determine the similarity score. If there's another way to do match and return highest score please advice.
Currently I am stuck here:
listMatch=[]
for brandList in scraped['brands'].unique() :
productItem = list(product[product['brands'] == brandList].itertuples(index=False))
scrapedItem = list(scraped[scraped['brands'] == brandList].itertuples(index=False))
for n,item in enumerate(productItem):
itemHold=[]
for no,pro in enumerate(scrapedItem):
extendItem = (item+pro)
itemHold.append(extendItem)
listMatch.extend(itemHold)
df =pd.DataFrame(listMatch, columns =['sku','product','Brands','productScraped','unit','unitPrice','web','date','brands2','unitPrice','salesPrice','normalPrice'])
so the main idea is to check both by category, get highest similiar value and append to new dataframe.
Done got it to work.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
