'Regex to parse simple markdown with escaped characters without look-behind
Note: This has to work in JavaScript RegExp
I have to parse string like this:
yo (p:abc-123-def) meets \(p:2) \(in the cinema\) \\ (p:3) (p:4\) won't
What I need to extract are all (<entity>:<id>) markups but ignore escaped things like \(in the ciname\) or \\. From the above example, the regex should only match
(p:abc-123-def)
(p:3)
but not \(p:2) or \(p:4) since the brackets are escaped.
Now, I am still able to modify that markup so if there is a simpler way to do the whole thing I'm open to suggestions. If not, I'd need to be able to get those (<entity>:<id>) markups from a regex.
Something like this
(?<!\\)\([^(?<!\\)\(]*\)
would work but look-behind groups are not supported by all browsers.
Solution 1:[1]
It can get complex when backslashes are repeated many times, like: \\\\\\\\\\\\\\(p:1). You would need to know whether the number of backslashes is even or odd in order to know whether the ( is escaped or not.
Secondly, the colon occurring within parentheses might be escaped as well, and would then not count(?).
So I would suggest to work with something like (?:\\.|[^:)\\])* which deals with escaped characters (.) and puts some requirements for unescaped characters, like [^:)\\].
So this is the result:
(?<!\\)(?:\\.)*\((?:\\.|[^:)\\])*:(?:\\.|[^:)\\])*\)
This uses look-behind which is being supported in the latest versions of popular browsers.
If look-behind is not an option, then capture the character that precedes the potential backslashes, and make a capture group for the part you need:
(?:[^\\]|^)((?:\\.)*\((?:\\.|[^:)\\])*:(?:\\.|[^:)\\])*\))
So here you need to work with the first captured group.
Solution 2:[2]
This regex needs to work
/(?<!\\)\([a-zA-Z]+\:[0-9a-zA-Z_]+\)/g
Edit: This code is javascript compiled.
Solution 3:[3]
One way could be to match what you don't want and to capture in a capturing group what you want to keep.
For example:
\\+\([^)]+\)|\([^)]+\\+\)|(\([^:]+:[^:]+\))
\\+\([^)]+\)Match 1+ times a backslash followed by an opening(till)|Or\([^)]+\\+\)Match(till 1+ times a backslash and)|Or(Capturing group\([^:]+:[^:]+\)Match(, not:, then:and again not:followed by)
)Close capturing group
const regex = /\\+\([^)]+\)|\([^)]+\\+\)|(\([^:]+:[^:]+\))/g;
const str = `yo (p:abc-123-def) meets \\(p:2) \\(in the cinema\\) \\\\ (p:3) (p:4\\) won't`;
let m;
while ((m = regex.exec(str)) !== null) {
if (typeof(m[1]) != 'undefined') {
console.log(m[1]);
}
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 |
