'Invalid regexp in R
I'm trying to use this regexp in R:
\?(?=([^'\\]*(\\.|'([^'\\]*\\.)*[^'\\]*'))*[^']*$)
I'm escaping like so:
\\?(?=([^'\\\\]*(\\\\.|'([^'\\\\]*\\\\.)*[^'\\\\]*'))*[^']*$)
I get an invalid regexp error.
Regexpal has no problem with the regex, and I've checked that the interpreted regex in the R error message is the exact same as what I'm using in Regex pal, so I'm sort of at a loss. I don't think the escaping is the problem.
Code:
output <- sub("\\?(?=([^'\\\\]*(\\\\.|'([^'\\\\]*\\\\.)*[^'\\\\]*'))*[^']*$)", "!", "This is a test string?")
Solution 1:[1]
R by default uses the POSIX (Portable Operating System Interface) standard of regular expressions (see these SO posts [1,2] and ?regex [caveat emptor: machete-level density ahead]).
Look-ahead ((?=...)), look-behind ((?<=...)) and their negations ((?!...) and (?<!...)) are probably the most salient examples of PCRE-specific (Perl-Compatible Regular Expressions) forms, which are not compatible with POSIX.
R can be trained to understand your regex by activating the perl option to TRUE; this option is available in all of the base regex functions (gsub, grepl, regmatches, etc.):
output <- sub(
"\\?(?=([^'\\\\]*(\\\\.|'([^'\\\\]*\\\\.)*[^'\\\\]*'))*[^']*$)",
"!",
"This is a test string?",
perl = TRUE
)
Of course it looks much less intimidating for R>=4.0 which has raw string support:
output <- sub(
R"(\?(?=([^'\\]*(\\.|'([^'\\]*\\.)*[^'\\]*'))*[^']*$))",
"!",
"This is a test string?",
perl = TRUE
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
