'Regex match a string that contains multiple substrings with each substring ending from 0 to fixed number n

String -

123|456|...|789 I am a long string with substring0 and substring1 also substring2 something more and end with substring3

There are as many substrings as a fixed number n (eg. n=3 in this case). The first substring ends with 0, second with 1 and last one with n i.e. sorted from left to right. Each substring is the same string. For example -

123|456|...|789 I am a long string with abcd0 and abcd1 also abcd2 something more and end with abcd3

How can we regex match such a line if the value of n is known in advance?

Sorry, I absolutely have no idea on how this can be achieved, hence didn't add anything more.

Thanks SO.



Solution 1:[1]

Assuming:

  • There are just as many substrings as n (and in order)

"With n=3, it is guaranteed that there will be only three substrings - substring0, substring1 and substring3 in that order from left to right."

  • You do indeed want to match the whole line;

"How can we regex match such a line if the value of n is known in advance?"


Try something like:

^.*?(?<!\S)(\S+)0(?!\S)(?:.*?(?<!\S)\1\d+(?!\S)){n}.*$

See an online demo


  • ^ - Start-line anchor;
  • .*? - Match 0+ (Lazy) characters;
  • (?<!\S)(\S+)0(?!\S) A 1st capture group of non-whitespace chars followed by a zero nested between two lookarounds to assert a full word;
  • (?:.*?(?<!\S)\1\d+(?!\S)){n} - A non-capture group to match {n} times 1+ whitespace chars and a backreference followed by digits nested in those same lookarounds;
  • .* - 0+ (Greedy) characters other than newline;
  • $ - End-line anchor.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1