'Capture text after multiple optional strings into named group with Regex

I am trying to extract multiple strings using different patterns from one long string. Here is an example of the input string:

[Update 2]Number of students: 5[New]Break at 1:45 pm\nStudents involved are: John, Joseph, Maria\nLunch at 2:00pm\nActivities remaining: long jump, shuffle

There are three prefixes which are used to extract the data after it: 'Students involved are:', 'Activities remaining:', 'Number of students:'. I managed to extract the above into a named group using the following Regex:

let pattern = /(?<=Number of students: )(?<number>[^\n]+).*?(?<=Students involved are: )(?<students>[^\n]+).*?(?<=Activities remaining: )(?<activities>[^\n]+)/gms
let match = pattern.exec(s)
const num = match.number;
const activities = match.activities;

The above works. However, I run into an issue when there is one of the strings missing. All the three prefixes I am searching for are optional. How can I modify the regex to handle optional patterns? Or is there a better way of accomplishing this? Thanks!



Solution 1:[1]

I'm not sure you need look behind assertions for that use case...

To answer your question you can wrap your individual patterns inside non-capturing groups followed by a question mark:

const r = /(?:Desc.1:\s*(?<tag1>.*?))?(?:Descr.2:\s*(?<tag2>.*?))?(?:Desc.3:\s*(?<tag3>.*?))?/

If the values can come in any order, you can use a global match and a disjunction:

const r = /x(?<tag1>.*)|y(?<tag2>.*)|z(?<tag3>.*)/g

for (const {groups: {tag1, tag2, tag3}} of source.matchAll(r)) {
 ...
}

You can see it in action here

Also, FYI, the flags you're using don't make a lot of sense to me:

  • g is useful to match several times (e.g. with "".matchAll(/regexp/g), but it is useless otherwise)
  • m makes ^ and $ assertions match the start and end of lines on top of their usual duty, but you're not using them

Solution 2:[2]

Here's my attempt:

"^\[[^\]]+\](Number of students: )*(?<number>[^\n]+)\\n(Students involved are: )*(?<students>[^\n]+)\\n(Activities remaining: )*(?<activities>[^\n]+)"

The differences between mine and yours are the following:

  • added ^\[[^\]]+\] at the beginning to match [<any characters>]
  • added * over the optional parts of your string
  • added \\n between the paired three parts

I've tested this regex with these two examples:

  • [Update 2]Number of students: 5[New]Break at 1:45 pm\nStudents involved are: John, Joseph, Maria\nLunch at 2:00pm\nActivities remaining: long jump, shuffle
  • [Update 2]5[New]Break at 1:45 pm\nJohn, Joseph, Maria\nLunch at 2:00pm\nlong jump, shuffle

Does it work for you?

ps. for any attempt to increased efficiency, more samples of pattern matching are needed

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Pygy
Solution 2 lemon