'ruby and splitting strings with regex

I have a string like

"A;BB;C[1;22];DDD[11;2;33];EEEEE[1111]"

I will split this into

["A","BB","C[1;22]","DDD[11;2;33]","EEEEE[1111]"]

The chars and the numbers are representatives for any strings of 1-x chars length.

My regex is like

/(?<!(\w+)\[)(\;(?!((\w+)((\;)(\w+)){0,}\])))/ 

https://regex101.com/r/fWNHBB/2

But I don't get it run in ruby. Can anyone help me here?



Solution 1:[1]

You can use

text.scan(/(?:\[[^\]\[]*\]|[^;])+/)

Details:

  • (?: - start of a non-capturing group:
    • \[ - a [ char
    • [^\]\[]* - zero or more chars other than [ and ]
    • \] - a ] char
  • | - or
    • [^;] - any single char other than a ; char
  • )+ - end of the group, repeat one or more times.

See the Ruby demo:

text = "A;BB;C[1;22];DDD[11;2;33];EEEEE[1111]"
puts text.scan(/(?:\[[^\]\[]*\]|[^;])+/)

Output:

A
BB
C[1;22]
DDD[11;2;33]
EEEEE[1111]

Solution 2:[2]

I have assumed brackets are matching and not overlapping. That is, every left bracket is followed by a right bracket with no left or right bracket between and every right bracket is preceded by a left bracket with no left or right bracket between.

With that proviso you can do that as follows.

str = "A;BB;C[1;22];DDD[11;2;33];EEEEE[1111]"
rgx = /;(?![^\[\]]*\])/
str.split(rgx)
  #=> ["A", "BB", "C[1;22]", "DDD[11;2;33]", "EEEEE[1111]"]

Demo

The regular expression can be broken down as follows.

;           # match ';'
(?!         # begin negative lookahead
  [^\[\]]*  # match >= chars other than '[' and ']'
  \]        # match ']'
)           # end negative lookahead

One could use the following regular expression to confirm that brackets are matching and not overlapping.

\A[^\[\]]*(?:\[[^\[\]]*\][^\[\]]*)*[^\[\]]*\z

Demo

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2