'Regular Expression in Java: Pattern.compile( "J.*\\d[0-35-9]-\\d\\d-\\d\\d" )
I found a code in Java Regular expression which is confusing to me:
Pattern.compile( "J.*\\d[0-35-9]-\\d\\d-\\d\\d" );
The string to be compiled is:
String string1 = "Jane's Birthday is 05-12-75\n" + "Dave's Birthday is 11-04-68\n" + "John's Birthday is 04-28-73\n" + "Joe's Birthday is 12-17-77";
What does it mean by the
[0-35-9]
And why there are 4 "\d"s instead of 3? I assume there are only 3 numbers in the birthday.
Solution 1:[1]
\\d
does not match a number, it matches a digit. The distinction is that \\d\\d
will match two consecutive digits.
[0-35-9]
will match a digit in the range 0-3
or a digit in the range 5-9
.
The practical upshot is that this matches a birthday where the month is 10, 11, 12, 01, 02, 03, 05, 06, 07, 08, or 09. The day and year don't matter provided they are two digits. It is a very long-winded way of saying "find me any birthday that was not in April (04
)".
Solution 2:[2]
What does it mean by the
[0-35-9]
:
It means that you are providing a set of characters enclosed within square brackets. It specifies the given characters that will successfully match a single character from a given input string. So the above class of characters will match if the matching character is among 0
through 3
, or 5
through 9
, inclusive .
And why there are 4 "\d"s instead of 3? I assume there are only 3 numbers in the birthday.
Your birthday string portion is: Birthday is 05-12-75
:
\d
is a predefined character class where \d
represents a digit, and \d\d
represents two consecutive digits. Hence for a date xx-xx-xx-xx
we would write, \\d\\d-\\d\\d-\\d\\d-\\d\\d
, where x
is assumed to represent a digit(0-9
)
Solution 3:[3]
The confusion arises in the way we perceive numbers. To our mathematical eye, it looks that the middle section is a single number, the number "35". But in actuality, it is two numbers, a "3" and a "5". As has been answered in depth previously, this is actually two ranges, the range of digits from 0 through 3 inclusive, and the range 5 through 9 inclusive, thus eliminating 4 from the possible digits it will match.
As to the number of "\d"s, there are actually 5 not 4. The first one pairs with a single digit from the ranges of digits to match a month (for example, October is 10 and June is 06, so both match, while April, which is 04, does not). the next two "\d"s pair up to be a day. The last two pair up to make a year.
Solution 4:[4]
This answer is true but I think there is a mistake with the first two digits of the date.
(Month value is should be 01-02-03-05-06-...-12)
\\d[0-35-9]
This regex provides all months except April but it must be 0-12 intervals, at the same time.
So, the correct regex is must be below;
(0[0-35-9]|1[0-2])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Chris Hayes |
Solution 2 | |
Solution 3 | jamesc1101 |
Solution 4 | babeyh |