'(*SKIP)(*FAIL) workaround in JavaScript RegExp

I have a regex pattern that works fine in regex101.com: ~<a .*?">(*SKIP)(*FAIL)|\bword\b

I am trying to make it a Regexp so it can be used in the replace() function in JavaScript.

The line of JavaScript code is:

var regex = new RegExp("~<a.*?\">(*SKIP)(*FAIL)|\\b"+ word + "\\b", 'g');

Where word is the word I'm trying to match.

When I run it though, the console shows the following error:

Uncaught (in promise) SyntaxError: Invalid regular expression:
/~<a.*?">(*SKIP)(*FAIL)|word/: Nothing to repeat

Am I escaping characters wrong?

I tried backslash-escaping every special character I could find (?, *, < and so on) in my JavaScript code and it still spat out that error.



Solution 1:[1]

You can work around the missing (*SKIP)(*FAIL) support in JavaScript using capturing groups in the pattern and a bit of code logic.

Note the (*SKIP)(*FAIL) verb sequence is explained in my YT video called "Skipping matches in specific contexts (with SKIP & FAIL verbs)". You can also find a demo of JavaScript lookarounds for four different scenarions: extracting, replacing, removing and splitting.

Let's adjust the code for the current question. Let's assume word always consists of word characters (digits, letters or underscores).

  1. Extracting: Capture the word into Group 1 and only extract Group 1 values:

const text = `foo <a href="foo.com">foo</a> foobar`;
const word = 'foo';
const regex = new RegExp(String.raw`<a .*?">|\b(${word})\b`, 'gi');
console.log(Array.from(text.matchAll(regex), x=>x[1]).filter(Boolean)); // => 1st word and `>foo<`
  1. Removing: Capture the context you need to keep into Group 1 and replace with a backreference to this group:

const text = `foo <a href="foo.com">foo</a> foobar`;
const word = 'foo';
const regex = new RegExp(String.raw`(<a .*?">)|\b${word}\b`, 'gi');
console.log(text.replace(regex, '$1')); // =>  <a href="foo.com"></a> foobar
  1. Replacing: Capture the context you need to keep into Group 1 and when it is used, replace with Group 1 value, else, replace with what you need in a callback function/arrow function used as the replacement argument:

const text = `foo <a href="foo.com">foo</a> foobar`;
const word = 'foo';
const regex = new RegExp(String.raw`(<a .*?">)|\b${word}\b`, 'gi');
console.log(text.replace(regex, (match, group1) => group1 || 'buz' ));
// => buz <a href="foo.com">buz</a> foobar
  1. Splitting: This is the most intricate scenario and it requires a bit more coding:

const text = `foo <a href="foo.com">foo</a> foobar`;
const word = 'foo';
const regex = new RegExp(String.raw`(<a .*?">)|\b${word}\b`, 'gi');

let m, res = [], offset = 0;
while (m = regex.exec(text)) { // If there is a match and...
  if (m[1] === undefined) {    // if Group 1 is not matched
    // put the substring to result array
    res.push(text.substring(offset, m.index)) // Put the value to array
    offset = m.index + m[0].length  // Set the new chunk start position
  }
}
if (offset < text.length) {     // If there is any more text after offset
  res.push(text.substr(offset)) // add it to the result array
}
console.log(res);
// => ["", " <a href=\"foo.com\">", "</a> foobar"]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Wiktor Stribiżew