'Regular expression using non-greedy matching -- confusing result

I thought I understood how the non-greedy modifier works, but am confused by the following result:

  • Regular Expression: (,\S+?)_sys$

  • Test String: abc,def,ghi,jkl_sys

  • Desired result: ,jkl_sys <- last field including comma

  • Actual result: ,def,ghi,jkl_sys

Use case is that I have a comma separated string whose last field will end in "_sys" (e.g. ,sometext_sys). I want to match only the last field and only if it ends with _sys.

I am using the non-greedy (?) modifier to return the shortest possible match (only the last field including the comma), but it returns all but the first field (i.e. the longest match).

What am I missing?

I used https://regex101.com/ to test, in case you want to see a live example.



Solution 1:[1]

It sounds like you're looking for the a string that ends with "_sys" and it has to be at the end of the source string, and it has to be preceded by a comma.

,\s*(\w+_sys)$

I added the \s* to allow for optional whitespace after the comma.

No non-greedy modifiers necessary.

The parens are around \w+_sys so you can capture just that string, without the comma and optional whitespace.

Solution 2:[2]

You can use

,[^,]+_sys$

The pattern matches:

  • , Match the last comma
  • [^,]+ Match 1 + occurrences of any char except ,
  • _sys Match literally
  • $ End of string

See a regex demo.

If you don't want to match newlines and whitespaces:

,[^\s,]+_sys$

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Andy Lester
Solution 2 The fourth bird