'How to not match a substring in any location of the main string
This might seem to be a repetitive question here but I have tried all other SO posts and the suggestions are not working for me.
Basically, I want to exclude strings that have a particular substring in them, either at the beginning, middle or at the end.
Here is an example,
Max_Num_HR, HR_Max_Num, Max_HR_Num
I want to exclude the strings that contain either _HR (at the end), HR_(at the beginning) or _HR_ (in between)
What I have tried so far: r"(^((?!HR_).*))(?<!_HR)$"
This will successfully exclude strings that have HR_ (at the beginning) and _HR (at the end), but not _HR_ (in between)
I have looked at How to exclude a string in the middle of a RegEx string?
But their solution did not seem to work for me.
I understand that the first segment of my code (^((?!HR_).*)) will exclude everything that contains HR_ since I have a ^ at the beginning followed by a negative lookahead. The second segment (?<!_HR)$ will begin at the end of the string and perform a negative lookbehind to see if _HR is not included at the end. Going with this train of thought, I tried including (?!_HR_) in between the two segments, but to no avail.
So, how do I get it to exclude all three HR_, _HR_, _HR considering Max_Num_HR, HR_Max_Num, Max_HR_Num as the test case?
Solution 1:[1]
The pattern is missing the assertion for _HR_ somewhere in the string.
You can add the negative lookbehind to assert not _HR at the end after the dollar sign like $(?<!_HR) to prevent some backtracking over the .+
Note that for a match only you don't need the capture groups.
^(?!HR_)(?!.*_HR_).+$(?<!_HR)
^Start of string(?!HR_)Assert notHR_at the start(?!.*_HR_)Assert not_HR_in the string.+$Match 1+ chars to not match an empty string, and assert end of string(?<!_HR)Assert not_HRto the left
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | The fourth bird |
