'Scanner with a lookbehind delimiter does not work near Scanner.BUFFER_SIZE boundary
If I use a Scanner with a delimiter that contains a lookbehind, like (?<=de)lim, the delimiter is not skipped when I design my input such that the internal buffer of the Scanner starts with "lim", even though in the source text it is preceded by "de":
public class Scanning {
final static int SCANNER_BUFFER_SIZE = 1024 * 2;
final static int OFFSET = 5;
final static String DATA = "=".repeat(SCANNER_BUFFER_SIZE - OFFSET) + "delimdelim" + "=".repeat(10);
public static void main(String[] args) {
Scanner scanner = new Scanner(new StringReader(DATA));
scanner.useDelimiter("(?<=de)lim");
while(scanner.hasNext()) {
System.out.println(scanner.next().replaceAll("=+", "="));
}
}
}
I think that DATA should be split on these delimiters:
===(...)===delimdelim==========
^~~ ^~~
And so the output should be:
=de
de
=
However, this outputs (openjdk version "17.0.2" 2022-01-18):
=de
limde
=
I can see in the debugger that when the scanner is about to return "limde", scanner.buf and scanner.matcher.text contain "limdelim...", so I suspect that it the cause. If I alter OFFSET to e.g. 3 or 7, my expected behavior occurs.
I could not find any reference to this behavior in the documentation of Scanner or Pattern, so is this intended? Is it not safe to use lookaround for the delimiter of a Scanner?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
