'Regular Expression in Java: Pattern.compile( "J.*\\d[0-35-9]-\\d\\d-\\d\\d" )

I found a code in Java Regular expression which is confusing to me:

Pattern.compile( "J.*\\d[0-35-9]-\\d\\d-\\d\\d" );

The string to be compiled is:

String string1 = "Jane's Birthday is 05-12-75\n" + "Dave's Birthday is 11-04-68\n" + "John's Birthday is 04-28-73\n" + "Joe's Birthday is 12-17-77";

What does it mean by the

[0-35-9]

And why there are 4 "\d"s instead of 3? I assume there are only 3 numbers in the birthday.



Solution 1:[1]

\\d does not match a number, it matches a digit. The distinction is that \\d\\d will match two consecutive digits.

[0-35-9] will match a digit in the range 0-3 or a digit in the range 5-9.

The practical upshot is that this matches a birthday where the month is 10, 11, 12, 01, 02, 03, 05, 06, 07, 08, or 09. The day and year don't matter provided they are two digits. It is a very long-winded way of saying "find me any birthday that was not in April (04)".

Solution 2:[2]

What does it mean by the [0-35-9]:

It means that you are providing a set of characters enclosed within square brackets. It specifies the given characters that will successfully match a single character from a given input string. So the above class of characters will match if the matching character is among 0 through 3, or 5 through 9, inclusive .

And why there are 4 "\d"s instead of 3? I assume there are only 3 numbers in the birthday.

Your birthday string portion is: Birthday is 05-12-75:

\d is a predefined character class where \d represents a digit, and \d\d represents two consecutive digits. Hence for a date xx-xx-xx-xx we would write, \\d\\d-\\d\\d-\\d\\d-\\d\\d, where x is assumed to represent a digit(0-9)

Solution 3:[3]

The confusion arises in the way we perceive numbers. To our mathematical eye, it looks that the middle section is a single number, the number "35". But in actuality, it is two numbers, a "3" and a "5". As has been answered in depth previously, this is actually two ranges, the range of digits from 0 through 3 inclusive, and the range 5 through 9 inclusive, thus eliminating 4 from the possible digits it will match.

As to the number of "\d"s, there are actually 5 not 4. The first one pairs with a single digit from the ranges of digits to match a month (for example, October is 10 and June is 06, so both match, while April, which is 04, does not). the next two "\d"s pair up to be a day. The last two pair up to make a year.

Solution 4:[4]

This answer is true but I think there is a mistake with the first two digits of the date.

(Month value is should be 01-02-03-05-06-...-12)

\\d[0-35-9]

This regex provides all months except April but it must be 0-12 intervals, at the same time.

So, the correct regex is must be below;

(0[0-35-9]|1[0-2])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Chris Hayes
Solution 2
Solution 3 jamesc1101
Solution 4 babeyh