'Get text inside brackets along with splitting delimiters in regex java?
I have a multiline string which is delimited by a set of different delimiters,
A Z DelimiterB B X DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H
I need to split that string by delimiters, but if some words are inside brackets then extract the bracket as a single word even if it contains a delimiter. I need them to be extract as follows,
A Z
DelimiterB
B X
DelimiterA
(C DelimiterA D) (extract with brackets)
DelimiterB
(E DelimiterA F)
DelimiterB
G
DelimiterA
H
Currently I am using this expression to split by delimiters,
(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB)))
I tried the following but it is not working. So how can I make this to work?
((?=\()|(?<=\))|(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB))))
Java Code,
String txt = "A DelimiterB B DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H";
String[] texts = txt.split("((?=\()|(?<=\))|(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB))))");
for (String word : texts) {
System.out.println(word);
}
Solution 1:[1]
IMO, Matching is easier than Splitting
Since the "delimiter" is also needed, I suggest to match the pattern we need instead. Base on the example given, we have below patterns to capture.
(C DelimiterA D)- Bracket contain a word, delimiter and a word
which is"\\(\\w+ (DelimiterA|DelimiterB) \\w+\\)"DelimiterB- Whole Delimiter.
which is"(DelimiterA|DelimiterB)".B,B X- One or multiple words which are not delimiter.
How to check the word is not delimiter?
We can check the " " in between is not followed/preceded by delimiter(check Regex not operator), which is"\\w+((?<!(DelimiterA|DelimiterB))\\s(?!(DelimiterA|DelimiterB))\\w+)*".
import java.util.Scanner;
public class SplitWithCustomDelimiter {
public static void main(String[] args) {
String txt = "A Z DelimiterB B X DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H";
// scanner can accept different source
Scanner scanner = new Scanner(txt);
scanner.findAll(
"\\(\\w+ (DelimiterA|DelimiterB) \\w+\\)" +
"|(DelimiterA|DelimiterB)" +
"|\\w+((?<!(DelimiterA|DelimiterB))\\s(?!(DelimiterA|DelimiterB))\\w+)*"
)
.map(matchResult -> matchResult.group()).forEach(System.out::println);
}
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | samabcde |
