'Scanner with a lookbehind delimiter does not work near Scanner.BUFFER_SIZE boundary

If I use a Scanner with a delimiter that contains a lookbehind, like (?<=de)lim, the delimiter is not skipped when I design my input such that the internal buffer of the Scanner starts with "lim", even though in the source text it is preceded by "de":

public class Scanning {
    final static int SCANNER_BUFFER_SIZE = 1024 * 2;
    final static int OFFSET = 5;
    final static String DATA = "=".repeat(SCANNER_BUFFER_SIZE - OFFSET) + "delimdelim" + "=".repeat(10);

    public static void main(String[] args) {
        Scanner scanner = new Scanner(new StringReader(DATA));

        scanner.useDelimiter("(?<=de)lim");

        while(scanner.hasNext()) {
            System.out.println(scanner.next().replaceAll("=+", "="));
        }
    }
}

I think that DATA should be split on these delimiters:

===(...)===delimdelim==========
             ^~~  ^~~

And so the output should be:

=de
de
=

However, this outputs (openjdk version "17.0.2" 2022-01-18):

=de
limde
=

I can see in the debugger that when the scanner is about to return "limde", scanner.buf and scanner.matcher.text contain "limdelim...", so I suspect that it the cause. If I alter OFFSET to e.g. 3 or 7, my expected behavior occurs.

I could not find any reference to this behavior in the documentation of Scanner or Pattern, so is this intended? Is it not safe to use lookaround for the delimiter of a Scanner?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source