'How to replace one column variables with variables in another table (using regex) python, is it even possible?

I have two datasets. First dataset includes all raw values that must be replaced with acceptable values that are given in the second dataset. If matching acceptable value is not found in second dataset, then leave it its own way.

First looks like this:

SOURCE_ID TITLE
1 Emaar Beachfront
2 EmaarBeachfront
3 emaar beachfront
4 dubai hills estate
5 Dubai Hills
6 Nad Al Sheba
7 Nadalsheba
8 dubai hills residences
9 The Cove Ru
10 Homes

Second looks like this:

ID TITLE
1 Emaar Beachfront
2 Dubai Hills
3 Nad Al Sheba
4 The Cove

So that in the end my dataset looks like this:

SOURCE_ID TITLE
1 Emaar Beachfront
2 Emaar Beachfront
3 Emaar Beachfront
4 Dubai Hills
5 Dubai Hills
6 Nad Al Sheba
7 Nad Al Sheba
8 Dubai Hills
9 The Cove
10 Homes

I thought it is possible via regex, but i am not sure



Solution 1:[1]

One solution could be this:

first = ["Emaar Beachfront",
"EmaarBeachfront",
"emaar beachfront",
"dubai hills estate",
"Dubai Hills",
"Nad Al Sheba",
"Nadalsheba",
"dubai hills residences",
"The Cove Ru",
"Homes"]

second = [
"Emaar Beachfront",
"Dubai Hills",
"Nad Al Sheba",
"The Cove"
]

second_transformed = [item.replace(" ", "").lower() for item in second]

out = []

for item in first:
    item_transformed = item.replace(" ", "").lower()
    item_found = False
    for second_item, second_item_transformed in zip(second, second_transformed):
        if second_item_transformed in item_transformed:
            out.append(second_item)
            item_found = True
            break
    if not item_found:
        out.append(item)

print(out)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mbostic