'Replace all occurances of a character unless surrounded by two different patterns
I want to find a regex (preferably in perl, but any flavour will do) to replace every _ except those preceded by exactly 8 digits and followed by exactly 6 digits.
Actually, I want to replace _ in filenames except those in dates with format YYYYMMDD_hhmmss.
Generally speaking, I want to replace every occurrances of some character that is not preceded by some pattern and not followed by an other pattern.
I tried many regexes and look for at lot on the web, but I did not find anything!
I know it is possible to replace every _ by ., then restore the _ in YYYYMMDD.hhmmss, but I am interested in doing it in one step (hoping it is possible).
Here are some examples of replacements:
Patate_17890505_TitreEnCamelCase.ext --> Patate.17890505.TitreEnCamelCase.ext
EPFL_AlgebreLineaire --> EPFL.AlgebreLineaire
ipe.20210302_005606.pdf --> ipe.20210302_005606.pdf
1_ --> 1.
12_ --> 12.
_1 --> .1
_12 --> .12
12345678_ --> 12345678.
_123456 --> .123456
12345678_12345 --> 12345678.12345
1234567_123456 --> 1234567.123456
1234567_12345 --> 1234567.12345
123456_12345 --> 123456.12345
12345678_1234567 --> 12345678.1234567
123456789_123456 --> 123456789.123456
123456789_1234567 --> 123456789.1234567
_patate__truc__ --> .patate..truc..
___ --> ...
foo_12345678 --> foo.12345678
foo_12345678_123456_bar --> foo.12345678_123456.bar
12345678_123456 --> 12345678_123456
foo12345678_123456bar --> foo12345678_123456bar
Below, a few examples I tried.
Make exactly the opposite of what I want, i.e. replace every _ preceded by exactly 8 digits and followed by exactly 6 digits (try it on regex101):
s/((?<!\d)(?:\d{8}))_((?:\d{6})(?!\d))/$1.$2/g
It works, so I need the negation of this regex…
Just a negative lookbehind and a negative lookahead (try it on regex101):
s/(?<!\d{8})_(?!\d{6})/./g
Fails: does not replace if _ is preceded by exactly 8 digits or followed by exactly 6 digits, e.g. the _ is not replaced in theses strings:
12345678_
_123456
12345678_12345
1234567_123456
I need to replace all except when “and”, but this one replaces all except when “or” (so it misses some _).
Inspired from this answer (from python regex: match a char surrounded by exactly 2 chars) (try it on regex101):
s/(?<!(?<!\d)\d{8})_(?!\d{6}(?!\d))/./g
Fails: same reason as the previous one.
The regex in the original answer works because it replace chars preceded by a pre-pattern and followed by a post-pattern.
Inspired from this answer (from Replace character UNLESS surrounded by specific tag), but I do not really understand how it works (try it on regex101):
s/_(?:(?!(?:.*?\d{6}))|(?=[^\d]+\d{8}))/./g
Fails: in these examples, the _ is not replaced
_123456
1234567_123456
12345678_1234567
123456789_123456
123456789_1234567
foo_12345678
The original problem is quite close of mine, but instead of \d{8} and \d{6}, the pre-pattern and post-pattern are HTML tags, so the problem is easier : <tag> and </tag> are unique elements where for my problem, the post-pattern \d{6} could be followed by an other digit (likewise the pre-pattern \d{8} could be preceded by an other digit).
But this one almost work, unlike the previous try, it replace the _ in both theses string:
12345678_
12345678_12345
so perhaps a modification could make it works as I want…
Solution 1:[1]
You can use
(?<!\d)\d{8}_\d{6}(?!\d)(*SKIP)(*F)|_
See the regex demo. Details:
(?<!\d)\d{8}_\d{6}(?!\d)- eight digits,_and six digits not enclosed with any other digits(*SKIP)(*F)- fail the match at the current location and continue the regex search from the failure location|- or_- an underscore in any other context.
An alternative regex is
_(?!(?<=(?<!\d)\d{8}_)\d{6}(?!\d))
See this regex demo. Details:
_- an underscore(?!(?<=(?<!\d)\d{8}_)\d{6}(?!\d))- a negative lookahead that fails the match if - immediately to the right of the current location - there are six (and no more than six) digits immediately preceded with exactly eight digits and an underscore.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wiktor Stribiżew |
