'Finding dates in text using regex
I want to find all dates in a text if there is no word Effective before the date. For example, I have the following line:
FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022
My regex should return ['January , 2022', 'January 5, 2022']
How can I do this in Python?
My attempt:
>>> import re
>>> rule = '((?<!Effective\ )([A-Za-z]{3,9}\ *\d{1,2}\ *,\ *\d{4}))'
>>> text = 'FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022'
>>> re.findall(rule, text)
[('anuary 1, 2022', 'anuary 1, 2022'), ('January 7, 2022', 'January 7, 2022'), ('January 5, 2022', 'January 5, 2022')]
But it doesn't work.
Solution 1:[1]
You can use
\b(?<!Effective\s)[A-Za-z]{3,9}\s*\d{1,2}\s*,\s*\d{4}(?!\d)
See the regex demo. Details:
\b- a word boundary(?<!Effective\s)- a negative lookbehind that fails the match if there isEffective+ a whitespace char immediately to the left of the current location[A-Za-z]{3,9}- three to nine ASCII letters\s*- zero or more whitespaces\d{1,2}- one or two digits\s*,\s*- a comma enclosed with zero or more whitespaces\d{4}- four digits(?!\d)- a negative lookahead that fails the match if there is a digit immediately on the right.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wiktor Stribiżew |
