'ruby and splitting strings with regex
I have a string like
"A;BB;C[1;22];DDD[11;2;33];EEEEE[1111]"
I will split this into
["A","BB","C[1;22]","DDD[11;2;33]","EEEEE[1111]"]
The chars and the numbers are representatives for any strings of 1-x chars length.
My regex is like
/(?<!(\w+)\[)(\;(?!((\w+)((\;)(\w+)){0,}\])))/
https://regex101.com/r/fWNHBB/2
But I don't get it run in ruby. Can anyone help me here?
Solution 1:[1]
You can use
text.scan(/(?:\[[^\]\[]*\]|[^;])+/)
Details:
(?:- start of a non-capturing group:\[- a[char[^\]\[]*- zero or more chars other than[and]\]- a]char
|- or[^;]- any single char other than a;char
)+- end of the group, repeat one or more times.
See the Ruby demo:
text = "A;BB;C[1;22];DDD[11;2;33];EEEEE[1111]"
puts text.scan(/(?:\[[^\]\[]*\]|[^;])+/)
Output:
A
BB
C[1;22]
DDD[11;2;33]
EEEEE[1111]
Solution 2:[2]
I have assumed brackets are matching and not overlapping. That is, every left bracket is followed by a right bracket with no left or right bracket between and every right bracket is preceded by a left bracket with no left or right bracket between.
With that proviso you can do that as follows.
str = "A;BB;C[1;22];DDD[11;2;33];EEEEE[1111]"
rgx = /;(?![^\[\]]*\])/
str.split(rgx)
#=> ["A", "BB", "C[1;22]", "DDD[11;2;33]", "EEEEE[1111]"]
The regular expression can be broken down as follows.
; # match ';'
(?! # begin negative lookahead
[^\[\]]* # match >= chars other than '[' and ']'
\] # match ']'
) # end negative lookahead
One could use the following regular expression to confirm that brackets are matching and not overlapping.
\A[^\[\]]*(?:\[[^\[\]]*\][^\[\]]*)*[^\[\]]*\z
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
