'Regex to get all numbers after a character

I have strings that are expected to be in the format of something like

"C 1,13,7,2,55" I would expect matches to be [1,13,7,2,55].

I want to match on all numbers in that "csv" portion of the string. But only if it comes after "C " Note a space after the 'c'

This comes from user-input and so I want to account for case and multiple space(s) in between tokens and accidental double commas, etc..

I.e. "c 1 , 12,15 , 8 , 9,10,11" I want matches to be [1,12,15,8,9,10,11]

But I only want to attempt to match on numbers after the "C" char (case-insensitive).

So "1,2 , 4,5" and "d 12456, 9890" should fail .

Here's the regex I have half-baked so far.

Note: This will ultimately get ported over to PHP and so I will be using preg_match_all

/(?<=C)*\d+/gim

I use a positive lookbehind (but match as many times as needed) for the "C" char. Then match on 1 or more digits globally.

I haven't created all my unit tests yet, but I think this may work.

Is there a better way to do this? Is matching on 1or more positive lookbehinds standard?

Why don't I need to include a \s* after the 'C' in the positive lookbehind? When would including the 'm' multi-line flag even make a difference here?

Thanks!



Solution 1:[1]

Using this pattern /(?<=C)*\d+/gim; in for example Javascript it would not be valid due to the quantifier after the lookbehind assertion.

If you want to write it in JavaScript getting all the digits after C at the start of the string, and the quantifier in the lookbehind is supported:

(?<=^C [\d, ]*)\d+

Regex demo

Using (?<=C)*\d+ in PHP, the quantifier for the lookbehind is optional, and it would also match 8 and 9 in for example this string 8,9 C 1,13,7,2,55

Using a quantifier with infinite length in a lookbehind assertion is not supported in PHP so you can not use (?<=C\h+)\d+ where \h+ would match 1+ spaces due to S


If you are using PHP, you can make use of the \G anchor to match only consecutive numbers after the first C character.

For a single line, you don't need the multi line flag. You do need it for multiple lines due to the anchor.

(?:^\h*C\h+|\G(?!^))\h*,*\h*\K\d+

The pattern matches:

  • (?: Non capture group
    • ^ Start of string
    • \h*C\h+ Match optional spaces, then C and 1+ spaces
    • | Or
    • \G(?!^) Assert the position at the end of the previous match (not at the start)
  • ) Close the non capture group
  • \h*,*\h*\K Match optional comma's between optional spaces
  • \d+ Match 1 or more digits

Regex demo | Php demo

$regex = '/(?:\h*C\h+|\G(?!^))\h*,*\h*\K\d+/i';
$strings = [
    "C 1,13,7,2,55",
    "c    1   ,  12,15     ,   8     ,   9,10,11",
    "1,2  ,  4,5",
    "d 12456, 9890"
];

foreach ($strings as $s) {
    if (preg_match_all($regex, $s, $matches)) {
        print_r($matches[0]);
    }
}

Output

Array
(
    [0] => 1
    [1] => 13
    [2] => 7
    [3] => 2
    [4] => 55
)
Array
(
    [0] => 1
    [1] => 12
    [2] => 15
    [3] => 8
    [4] => 9
    [5] => 10
    [6] => 11
)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1