'can someone explain how split("(?<=\\G.{" + 2 + "})") in java splits the spring in 2 characters each?
String s = "()[]{}";
String[] array = s.split("(?<=\\G.{" + 2 + "})");
for(int i = 0; i< array.length; i++){
System.out.println(array[i]);
}
Output: () [] {}
I just don't understand the regex.
Solution 1:[1]
Let's break it down:
"(?<=\\G.{" + 2 + "})"is just a silly way to write"(?<=\\G.{2})"..{2}is 'any 2 characters'..is 'any character', and{2}is 'the thing before this, exactly twice.- Now for the fancypants features here:
\\Gis 'the end of the previous match, which is a thing that only makes sense in the context of repeatedly callingfind()on a matcher.. andsplit. Thus,\\G.{2}matches exactly 2 characters that immediately follow the previous match.
The regexp you pass to split actually identifies the separator. Whatever you match will not be part of the result: "a,b,c".split(",") ends up not returning any commas, so .split("\\G.{2}") would eat up everything and end up returning a few empty strings. Which is why there's one more thing we didn't cover:
?<=is so-called 'positive lookbehind'.
To explain that, we first need to delve into how regexes work. Regexes consist of a series of 'nodes'. Each node does 2 utterly unrelated things: It either matches or it doesn't, and it 'moves the pointer' some slots to the right. The regexp "a" consists of a single node, which matches if the thing 'at the pointer' is, exactly, the character "a", and will fail to match if there's anything else there. If it matches, it moves the cursor one to the right.
Some nodes don't move the cursor. They match or fail but don't 'consume' anything. For example, ^ matches if you're at the start of a string and fails otherwise, but it doesn't consume anything ('start of the string' is not a consumable concept, that has no width. There is no 'start' character). \\b (word break) is similar.
lookahead and lookbehind are tools to make your own non-consuming nodes. They consist of two things:
- Which lookX you want. There are 4 types: Positive/Negative Lookahead/Lookbehind.
- A regexp to match.
negative means: It fails if the regexp is there (positive means: it fails if the regexp is not there). lookahead means that the regexp needs to match at the cursor looking to the right, and lookbehind means that the regexp needs to match at the cursor looking left.
?<= selects positive lookbehind, the rest is therefore the regexp to look for - it no longer consumes.
Now you can put it all together:
This regexp 'matches' (consumes) nothing, because it's entirely a positive lookbehind construction. That explains why all the characters are returned (your array variable) - remember, in the more usual "a,b,c".split(",") style mode, stuff disappears (the commas).
Instead, this regexp matches the space in between every 2 characters.
For what its worth, this is stupidly convoluted. This is a horrible, horrible way to separate a string into consecutive character pairs. I think someone is trying to show off, or this is some sort of 'ha! Check out what regexes can do!' puzzler thing. Fair enough.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | rzwitserloot |
