'pandas: Replace string is not replacing targeted substring
I am trying to iterate a list of strings using dataframe1 to check whether the other dataframe2 has any strings found in dataframe1 to replace them.
for index, row in nlp_df.iterrows():
print( row['x1'] )
string1 = row['x1'].replace("(","\(")
string1 = string1.replace(")","\)")
string1 = string1.replace("[","\[")
string1 = string1.replace("]","\]")
nlp2_df['title'] = nlp2_df['title'].replace(string1,"")
In order to do this I iterated using the code shown above to check and replace for any string found in df1
The output belows shows the strings in df1
wait_timeout
interactive_timeout
pool_recycle
....
__all__
folder_name
re.compile('he(lo')
The output below shows the output after replacing strings in df2
0 have you tried watching the traffic between th...
1 /dev/cu.xxxxx is the "callout" device, it's wh...
2 You'll want the struct package.\r\r\n
For the output in df2 strings like /dev/cu.xxxxx should have been replaced during the iteration but as shown it is not removed. However, I have attempted using nlp2_df['title'] = nlp2_df['title'].replace("/dev/cu.xxxxx","") and managed to remove it successfully.
Is there a reason why directly writing the string works but looping using a variable to use for replacing does not?
Solution 1:[1]
IIUC you can simply use regular expressions:
nlp2_df['title'] = nlp2_df['title'].str.replace(r'([\(\)\[\]])',r'\\\1')
PS you don't need for loop at all...
Demo:
In [15]: df
Out[15]:
title
0 aaa (bbb) ccc
1 A [word] ...
In [16]: df['new'] = df['title'].str.replace(r'([\(\)\[\]])',r'\\\1')
In [17]: df
Out[17]:
title new
0 aaa (bbb) ccc aaa \(bbb\) ccc
1 A [word] ... A \[word\] ...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | MaxU - stop genocide of UA |
