'Regex to match multiple variations with lookbehind conditionals and being DRY

In JavaScript, I'm trying to match the following samples to capture just src/ for a find and replace that is iterating over HTML and CSS files.

Samples to match:

1. url(src/imgs/...)
2. url("src/imgs/...")
3. url('src/imgs/...')
4. <img src="src/imgs/...">

I'm trying to not repeat in the Regex and don't want multiple groups with almost identical expressions.

I created a non-capture group and I'm able to match 1 & 4 but it does not match 2 & 3.

(((?<=url\()(?<=['"])?)|(?<=\<img\ssrc="))src\/

I can't figure out how to capture all 4 groups in a DRY way without optional quantifiers - which aren't supported in non-capture groups.

A less elegant Regex like below will work.

((?<=url\()|(?<=url\(['"])|(?<=\<img\ssrc="))src\/

Is there a more elegant solution than using two groups to account for variations in the samples 1 to 3?



Solution 1:[1]

Extracting data from the CSS and HTML source with regexp does not look like a good idea. There are special parsers for that. Though here's the code you might find helpfull:

var samples = [
  'url(src/imgs/...)',
  'url("src/imgs/...")',
  'url(\'src/imgs/...\')',
  '<img src="src/imgs/...">',
  '<img src=\'src/imgs/...\'>',
  '<img src=\'src/imgs/...\' alt=\'\'>',
];

var report = [];

var re = new RegExp(
  '(?:'                   // prefix is
    + '(?:url\\()'        // 'url('
    + '|'                 // or 
    + '(?:<img\\s+src=)'  // '<img src='
  + ')'
  + '([\'"]?)'            // followed by an optional quote
  + '(.*?)'               // URL itself
  + '\\1'                 // followed by the quote (still optional)
  + '\s?.*'               // 'img' specific optional suffix (other attributes)
  + '[)>]'                // enclosed by a '>' or a ')' symbol
);

samples.forEach(function( sample ){

  report.push( 
    re.exec(
      sample
    ).join("\t\t")
  );

})

console.log(report.join("\n"));

Output:

url(src/imgs/...)               
url("src/imgs/...")     "       src/imgs/...
url('src/imgs/...')     '       src/imgs/...
<img src="src/imgs/...">        "       src/imgs/...
<img src='src/imgs/...'>        '       src/imgs/...
<img src='src/imgs/...' alt=''>     '       src/imgs/...

Note, that this will fail for images with unquoted src like this: <img src=src/imgs/...>

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1