'Canonical equivalence in Pattern

I am referring to the test harness listed here http://docs.oracle.com/javase/tutorial/essential/regex/test_harness.html

The only change I made to the class is that the pattern is created as below:

Pattern pattern = 
        Pattern.compile(console.readLine("%nEnter your regex(Pattern.CANON_EQ set): "),Pattern.CANON_EQ);

As the tutorial at http://docs.oracle.com/javase/tutorial/essential/regex/pattern.html suggests I put in the pattern or regex as a\u030A and string to match as \u00E5 but it ends on a No Match Found. I saw both the strings are a small case 'a' with a ring on top.

Have I not understood the use case correctly?



Solution 1:[1]

The behavior you're seeing has nothing to do with the Pattern.CANON_EQ flag.

Input read from the console is not the same as a Java string literal. When the user (presumably you, testing out this flag) types \u00E5 into the console, the resultant string read by console.readLine is equivalent to "\\u00E5", not "å". See for yourself: http://ideone.com/lF7D1

As for Pattern.CANON_EQ, it behaves exactly as described:

Pattern withCE = Pattern.compile("^a\u030A$",Pattern.CANON_EQ);
Pattern withoutCE = Pattern.compile("^a\u030A$");
String input = "\u00E5";

System.out.println("Matches with canon eq: "
    + withCE.matcher(input).matches()); // true
System.out.println("Matches without canon eq: "
    + withoutCE.matcher(input).matches()); // false

http://ideone.com/nEV1V

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1