'Strict regex in Pandas replace

I need to write a strict regular expression to replace certain values in my pandas dataframe. This is an issue that was raised after solving the question that I posted here.

The issue is that the .replace(idsToReplace, regex=True) is not strict. Therefore if the iDsToReplace are:

NY : New York
NYC : New York City

and the comment in which we are replacing the ID is:

My cat from NYC is large.

The resulting response is:

My cat from New York is large.

Is there a pythonic way within the pandas replace function to make the regular expression stricter to match with NYC and not NY?



Solution 1:[1]

Add \b for word boundaries to each key of dict:

d = {'UK': 'United Kingdom', 'LA': 'Los Angeles', 'NYC': 'New York City', 'NY' : 'New York'}

data = {'Categories': ['animal','plant','object'],
    'Type': ['tree','dog','rock'],
        'Comment': ['The NYC tree is very big', 'NY The cat from the UK is small',
                    'The rock was found in LA.']
}

d = {r'\b' + k + r'\b':v for k, v in d.items()}

df = pd.DataFrame(data)

df['commentTest'] = df['Comment'].replace(d, regex=True)
print (df)
  Categories                          Comment  Type  \
0     animal         The NYC tree is very big  tree   
1      plant  NY The cat from the UK is small   dog   
2     object        The rock was found in LA.  rock   

                                         commentTest  
0                 The New York City tree is very big  
1  New York The cat from the United Kingdom is small  
2                 The rock was found in Los Angeles.  

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1