'Regex to validate string format and structure based on a given set of labels
I check a string with a Regex for its valid structure and format.
The Regex I use:
^(?!.+(more|enough|less)+$)(^(?:more|enough|less))[a-z_,]+$
The 1st part of the Regex says: do not end with one of the labels and without whitespace.
The 2nd part of the Regex says: start with one of the labels followed by any characters => and here is the problem I have! The rule must be: Start with one of the labels, followed by any characters. Additionally the other labels may appear again, but only once, and they must begin with a ",".
The format and structure of the string can contain a "," as delimiter and follows the rule:
- [[more|enough|less{topic}][_{aspect}]
- [[more|enough|less{topic}][{aspect}],[[more|enough|less{topic}][{aspect}]
Whereby the label more, enough and less may only be contained once in the string.
My Regex works for nearly all combinations, except:
- lesschips,lessfish
- lesschipsmorebier_cold
For testing I use the following combinations:
- morefish
- morefish_fried
- lesschips
- morebier_cold,lesschips,enoughwater
- lesschips,morebier_cold
... and the following invalid combinations:
- more
- morefish => the example ends with a whitespace
- morefish
- moreless
- lessmore
- leschips
- lesschips,lessfish
- lesschipsmorebier_cold
- morebier_cold,lesschips,enough
Solution 1:[1]
You can use another negative lookahead to exclude matching one of the same alternatives twice without crossing a comma using a negative lookahead.
And another negative lookahead to not match any of the alternatives twice before crossing a comma.
As you are not matching spaces, you can use \S to match any whitspace char instead of . which can also match a space.
^(?!\S*(?:more|enough|less)$)(?!\S*?(more|enough|less)\S*?\1)(?!\S*?(?:more|enough|less)[^\s,]*?(?:more|enough|less))(?:more|enough|less)[a-z_]*(?:,(?:more|enough|less)[a-z_]*)*$
The pattern in parts matches:
^Start of string(?!\S*(?:more|enough|less)$)Do not match the words at the end of string(?!\S*?(more|enough|less)\S*?\1)Do not match the same words twice in the string(?!\S*?(?:more|enough|less)[^\s,]*?(?:more|enough|less))Do not match any of the words together in the same part without matching a comma(?:more|enough|less)[a-z_]*Start the match with any of the alternatives and optional chars a-z or_(?:,(?:more|enough|less)[a-z_]*Optionally repeat matching a comma and again one of the alternatives and optional chars a-z or_$End of string
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
