'Regex to parse simple markdown with escaped characters without look-behind

Note: This has to work in JavaScript RegExp

I have to parse string like this:

yo (p:abc-123-def) meets  \(p:2) \(in the cinema\) \\ (p:3) (p:4\) won't 

What I need to extract are all (<entity>:<id>) markups but ignore escaped things like \(in the ciname\) or \\. From the above example, the regex should only match

(p:abc-123-def)
(p:3)

but not \(p:2) or \(p:4) since the brackets are escaped.

Now, I am still able to modify that markup so if there is a simpler way to do the whole thing I'm open to suggestions. If not, I'd need to be able to get those (<entity>:<id>) markups from a regex.

Something like this

(?<!\\)\([^(?<!\\)\(]*\)

would work but look-behind groups are not supported by all browsers.



Solution 1:[1]

It can get complex when backslashes are repeated many times, like: \\\\\\\\\\\\\\(p:1). You would need to know whether the number of backslashes is even or odd in order to know whether the ( is escaped or not.

Secondly, the colon occurring within parentheses might be escaped as well, and would then not count(?).

So I would suggest to work with something like (?:\\.|[^:)\\])* which deals with escaped characters (.) and puts some requirements for unescaped characters, like [^:)\\].

So this is the result:

(?<!\\)(?:\\.)*\((?:\\.|[^:)\\])*:(?:\\.|[^:)\\])*\)

This uses look-behind which is being supported in the latest versions of popular browsers.

If look-behind is not an option, then capture the character that precedes the potential backslashes, and make a capture group for the part you need:

(?:[^\\]|^)((?:\\.)*\((?:\\.|[^:)\\])*:(?:\\.|[^:)\\])*\))

So here you need to work with the first captured group.

Solution 2:[2]

This regex needs to work

/(?<!\\)\([a-zA-Z]+\:[0-9a-zA-Z_]+\)/g

Edit: This code is javascript compiled.

Regexpr Fiddle

Solution 3:[3]

One way could be to match what you don't want and to capture in a capturing group what you want to keep.

For example:

\\+\([^)]+\)|\([^)]+\\+\)|(\([^:]+:[^:]+\))

Regex demo

  • \\+\([^)]+\) Match 1+ times a backslash followed by an opening ( till )
  • | Or
  • \([^)]+\\+\) Match ( till 1+ times a backslash and )
  • | Or
  • ( Capturing group
    • \([^:]+:[^:]+\) Match (, not :, then : and again not : followed by )
  • ) Close capturing group

const regex = /\\+\([^)]+\)|\([^)]+\\+\)|(\([^:]+:[^:]+\))/g;
const str = `yo (p:abc-123-def) meets  \\(p:2) \\(in the cinema\\) \\\\ (p:3) (p:4\\) won't`;
let m;

while ((m = regex.exec(str)) !== null) {
  if (typeof(m[1]) != 'undefined') {
    console.log(m[1]);
  }
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3