'Java regex to split a string around parentheses
I am trying to split mathematical expression by parenthesis using regex. For instance, I am trying to get the following equation, (3x^2)/(y-17x)+x^y, to become:
(3x^2)
/
(y-17x)
+x^y
Where each new line is its own string in an array. The current regex command I am using is:
split("(?<=[*/])|(?=[*/])");
This almost works. It gives me:
(
3x^2
)
/
(
y-17x
)
+x^y
I am fairly new to regex and have not been able to figure out how to include the parenthesis. Thanks for the help!
Solution 1:[1]
This is really quite hard to do. For example, what if I write: (3 + (2*y) + 5) * 10? That's parens-in-parens.
More generally the task of parsing through a mathematical expression like this is fundamentally just not something regex is good at. Instead you want to build a parse tree or use a library that makes one for you. You want to turn the above into something like:
*
|
--------
/ \
+ 10
|
/------\
+ 5
|
/---\
3 *
/ \
2 y
The idea is: It should be obvious to see how, given such a tree, you'd calculate the result, and it should also be somewhat obvious that it's not particularly difficult to turn the string into that tree (though not exactly trivial either).
Regexes aren't particularly useful for this task.
At any rate, if you want to know how to split on paired parens, well, you'll need to consider that the paired parens may themselves also contain parenthesized expressions. And I don't think regular expressions can actually parse this, so, you're hosed. You have to put in arbitrary (and no doubt, not acceptable for this assignment) limitations such as either 'no parens-in-parens' or perhaps 'not more than 3 nested parens-in-parens' which is icky.
.split in turn is the wrong tool for the job on top of regular expressions in general being the wrong tool for the job. Just use the plain regular expression API for this, not split. Split requires you to write a regex that dictates what the separator looks like. When you're looking for specific things / when the input you are parsing doesn't really strike you as a sequence of tokens and separators (as it is here), it's usually much, much simpler to describe the actual thing you are looking for (instead the separator which is essentially the thing you aren't looking for). So let's do that:
Pattern p = Pattern.compile("\\s*(\\(.*?\\)|[^(]+)\\s*");
Matcher m = p.matcher("(3x^2)/(y-17x)+x^y");
while (m.find()) {
System.out.println(m.group(1));
}
this prints:
(3x^2)
/
(y-17x)
+x^y
which seems like what you wanted (though, I'm getting the sense you think this helps you 'resolve' this expression into a value. It won't - you need that tree, this is a dead end).
The regular expression describes what 'nodes' you want: Either an open paren, and then everything until the nearest close paren, or, if there is no open paren, everything until the nearest open paren. I've festooned it with 2 useful addons:
- It ignores whitespace at the start and end of nodes / parentheses pairs.
- It captures the actual content (so, without the whitespace) as 'group 1', so that
.group(1)gives you this. In regexes you do this with parentheses.foo(ba+r)matches the string "foobaaaar" and.group(1)would return "baaaar" when it does.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | rzwitserloot |
