'Regex to validate string format and structure based on a given set of labels

I check a string with a Regex for its valid structure and format.

The Regex I use: ^(?!.+(more|enough|less)+$)(^(?:more|enough|less))[a-z_,]+$

  • The 1st part of the Regex says: do not end with one of the labels and without whitespace.

  • The 2nd part of the Regex says: start with one of the labels followed by any characters => and here is the problem I have! The rule must be: Start with one of the labels, followed by any characters. Additionally the other labels may appear again, but only once, and they must begin with a ",".

The format and structure of the string can contain a "," as delimiter and follows the rule:

  • [[more|enough|less{topic}][_{aspect}]
  • [[more|enough|less{topic}][{aspect}],[[more|enough|less{topic}][{aspect}]

Whereby the label more, enough and less may only be contained once in the string.

My Regex works for nearly all combinations, except:

  • lesschips,lessfish
  • lesschipsmorebier_cold

For testing I use the following combinations:

  • morefish
  • morefish_fried
  • lesschips
  • morebier_cold,lesschips,enoughwater
  • lesschips,morebier_cold

... and the following invalid combinations:

  • more
  • morefish => the example ends with a whitespace
  • morefish
  • moreless
  • lessmore
  • leschips
  • lesschips,lessfish
  • lesschipsmorebier_cold
  • morebier_cold,lesschips,enough


Solution 1:[1]

You can use another negative lookahead to exclude matching one of the same alternatives twice without crossing a comma using a negative lookahead.

And another negative lookahead to not match any of the alternatives twice before crossing a comma.

As you are not matching spaces, you can use \S to match any whitspace char instead of . which can also match a space.

^(?!\S*(?:more|enough|less)$)(?!\S*?(more|enough|less)\S*?\1)(?!\S*?(?:more|enough|less)[^\s,]*?(?:more|enough|less))(?:more|enough|less)[a-z_]*(?:,(?:more|enough|less)[a-z_]*)*$

The pattern in parts matches:

  • ^ Start of string
  • (?!\S*(?:more|enough|less)$) Do not match the words at the end of string
  • (?!\S*?(more|enough|less)\S*?\1) Do not match the same words twice in the string
  • (?!\S*?(?:more|enough|less)[^\s,]*?(?:more|enough|less)) Do not match any of the words together in the same part without matching a comma
  • (?:more|enough|less)[a-z_]* Start the match with any of the alternatives and optional chars a-z or _
  • (?:,(?:more|enough|less)[a-z_]* Optionally repeat matching a comma and again one of the alternatives and optional chars a-z or _
  • $ End of string

Regex demo

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1