'Capture text after multiple optional strings into named group with Regex
I am trying to extract multiple strings using different patterns from one long string. Here is an example of the input string:
[Update 2]Number of students: 5[New]Break at 1:45 pm\nStudents involved are: John, Joseph, Maria\nLunch at 2:00pm\nActivities remaining: long jump, shuffle
There are three prefixes which are used to extract the data after it: 'Students involved are:', 'Activities remaining:', 'Number of students:'. I managed to extract the above into a named group using the following Regex:
let pattern = /(?<=Number of students: )(?<number>[^\n]+).*?(?<=Students involved are: )(?<students>[^\n]+).*?(?<=Activities remaining: )(?<activities>[^\n]+)/gms
let match = pattern.exec(s)
const num = match.number;
const activities = match.activities;
The above works. However, I run into an issue when there is one of the strings missing. All the three prefixes I am searching for are optional. How can I modify the regex to handle optional patterns? Or is there a better way of accomplishing this? Thanks!
Solution 1:[1]
I'm not sure you need look behind assertions for that use case...
To answer your question you can wrap your individual patterns inside non-capturing groups followed by a question mark:
const r = /(?:Desc.1:\s*(?<tag1>.*?))?(?:Descr.2:\s*(?<tag2>.*?))?(?:Desc.3:\s*(?<tag3>.*?))?/
If the values can come in any order, you can use a global match and a disjunction:
const r = /x(?<tag1>.*)|y(?<tag2>.*)|z(?<tag3>.*)/g
for (const {groups: {tag1, tag2, tag3}} of source.matchAll(r)) {
...
}
You can see it in action here
Also, FYI, the flags you're using don't make a lot of sense to me:
g
is useful to match several times (e.g. with"".matchAll(/regexp/g)
, but it is useless otherwise)m
makes^
and$
assertions match the start and end of lines on top of their usual duty, but you're not using them
Solution 2:[2]
Here's my attempt:
"^\[[^\]]+\](Number of students: )*(?<number>[^\n]+)\\n(Students involved are: )*(?<students>[^\n]+)\\n(Activities remaining: )*(?<activities>[^\n]+)"
The differences between mine and yours are the following:
- added
^\[[^\]]+\]
at the beginning to match[<any characters>]
- added
*
over the optional parts of your string - added
\\n
between the paired three parts
I've tested this regex with these two examples:
[Update 2]Number of students: 5[New]Break at 1:45 pm\nStudents involved are: John, Joseph, Maria\nLunch at 2:00pm\nActivities remaining: long jump, shuffle
[Update 2]5[New]Break at 1:45 pm\nJohn, Joseph, Maria\nLunch at 2:00pm\nlong jump, shuffle
Does it work for you?
ps. for any attempt to increased efficiency, more samples of pattern matching are needed
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Pygy |
Solution 2 | lemon |