'pandas: Replace string is not replacing targeted substring

I am trying to iterate a list of strings using dataframe1 to check whether the other dataframe2 has any strings found in dataframe1 to replace them.

for index, row in nlp_df.iterrows():
    print( row['x1'] )
    string1 = row['x1'].replace("(","\(")
    string1 = string1.replace(")","\)")
    string1 = string1.replace("[","\[")
    string1 = string1.replace("]","\]")
    nlp2_df['title'] = nlp2_df['title'].replace(string1,"")

In order to do this I iterated using the code shown above to check and replace for any string found in df1

The output belows shows the strings in df1

wait_timeout
interactive_timeout
pool_recycle
....
__all__
folder_name
re.compile('he(lo') 

The output below shows the output after replacing strings in df2

0   have you tried watching the traffic between th...
1   /dev/cu.xxxxx is the "callout" device, it's wh...
2               You'll want the struct package.\r\r\n

For the output in df2 strings like /dev/cu.xxxxx should have been replaced during the iteration but as shown it is not removed. However, I have attempted using nlp2_df['title'] = nlp2_df['title'].replace("/dev/cu.xxxxx","") and managed to remove it successfully.

Is there a reason why directly writing the string works but looping using a variable to use for replacing does not?



Solution 1:[1]

IIUC you can simply use regular expressions:

nlp2_df['title'] = nlp2_df['title'].str.replace(r'([\(\)\[\]])',r'\\\1')

PS you don't need for loop at all...

Demo:

In [15]: df
Out[15]:
           title
0  aaa (bbb) ccc
1   A [word] ...

In [16]: df['new'] = df['title'].str.replace(r'([\(\)\[\]])',r'\\\1')

In [17]: df
Out[17]:
           title              new
0  aaa (bbb) ccc  aaa \(bbb\) ccc
1   A [word] ...   A \[word\] ...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 MaxU - stop genocide of UA