'Split String by | and numbers

Let's imagine I have the following strings:

String one = "123|abc|123abc";
String two = "123|ab12c|abc|456|abc|def";
String three = "123|1abc|1abc1|456|abc|wer";
String four = "123|abc|def|456|ghi|jkl|789|mno|pqr";

If I do a split on them I expect the following output:

one = ["123|abc|123abc"];
two = ["123|ab12c|abc", "456|abc|def"];
three = ["123|1abc|1abc1", "456|abc|wer"];
four = ["123|abc|def", "456|ghi|jkl", "789|mno|pqr"];

The string has the following structure:

Starts with 1 or more digits followed by a random number of (| followed by random number of characters).

When after a | it's only numbers is considered a new value.

More examples:

In - 123456|xxxxxx|zzzzzzz|xa2314|xzxczxc|1234|qwerty
Out - ["123456|xxxxxx|zzzzzzz|xa2314|xzxczxc", "1234|qwerty"]

Tried multiple variations of the following but does not work:

value.split( "\\|\\d+|\\d+" )


Solution 1:[1]

Instead of splitting, you can match the parts in the string:

\b\d+(?:\|(?!\d+(?:$|\|))[^|\r\n]+)*
  • \b A word boundary
  • \d+ Match 1+ digits
  • (?: Non capture group
    • \|(?!\d+(?:$|\|)) Match | and assert not only digits till either the next pipe or the end of the string
    • [^|\r\n]+ Match 1+ chars other than a pipe or a newline
  • )* Close the non capture group and optionally repeat (use + to repeat one or more times to match at least one pipe char)

Regex demo | Java demo

String regex = "\\b\\d+(?:\\|(?!\\d+(?:$|\\|))[^|\\r\\n]+)+";
String string = "123|abc|def|456|ghi|jkl|789|mno|pqr";
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(string);
List<String> matches = new ArrayList<String>();

while (m.find()) 
    matches.add(m.group());

for (String s : matches)
    System.out.println(s);

Output

123|abc|def
456|ghi|jkl
789|mno|pqr

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1