'Regex to match two or more spaces

I'm trying to parse some attributes from a modem's AT output. My regex is as follow:

([^:]*):\s*([^\s]*)

Sample output as follow:

LTE SSC1 bw  : 20 MHz           LTE SSC1 chan: 2850
LTE SSC2 state:INACTIVE         LTE SSC2 band: B20
LTE SSC2 bw  : 10 MHz           LTE SSC2 chan: 6300
EMM state:     Registered       Normal Service
RRC state:     RRC Connected
IMS reg state: NOT REGISTERED   IMS mode:    Normal

This mostly works ok but not so well where an attribute's value has more characters after the first whitespace. For example, the match "LTE SSC2 bw" has a group 2 value of "10" when it should be "10 MHz".

Ideally I need the regex to match exactly the attributes, and group the value for it.

Hope this makes sense and thanks for your help.



Solution 1:[1]

If there is always at least two spaces between the key-value pairs you can use

([^:\s][^:]*):[^\S\r\n]*(\S+(?:[^\S\r\n]\S+)*)

See the regex demo.

Details:

  • ([^:\s][^:]*) - Group 1: a char other than whitespace and : and then zero or more non-: chars
  • : - a colon
  • [^\S\r\n]* - zero or more whitespaces other than CR and LF chars
  • (\S+(?:[^\S\r\n]\S+)*) - Group 2: one or more non-whitespaces, then zero or more repetitions of a whitespace other than CR and LF chars and then one or more non-whitespace chars.

Solution 2:[2]

You can try with this regex:

(?<attribute>[A-Z]{3} [^:]+): *(?<value1>.*?)(?> {2,}|$)(?<value2>[^:]+$)?

The groups you have are the following:

  • Group 1 attribute: will contain the attribute name
  • Group 2 value1: will contain the attribute value
  • Group 3 value2: will contain the optional attribute second value (for the fourth line)

Explanation:

  • (?<attribute>[A-Z]{3} [^:]+): Group 1
    • [A-Z]{3}: three uppercase letters
    • : a space
    • [^:]+: any combination of characters other than colon
  • : *: colon + any number of spaces
  • (?<value1>.*?): Group 2
    • .*?: any character (in lazy modality, so that it tries to match the least amount that can match)
  • (?> {2,}|$): Positive lookahead that matches
    • {2,}: two or more spaces (end of first inline attribute:value)
    • |: or
    • $: end of string (end of second inline attribute:value)
  • (?<value2>[^:]+$)?: Group 3
    • [^:]+: any combination of characters other than colon
    • $: end of string

You can call each group by their respective names.

Try it here.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Wiktor Stribiżew
Solution 2 lemon