'Python Date cleaner using regex

I've been trying to do a script that takes date inputs like 3/14/2015, 03-14-2015, and 2015/3/14 (using pyperclip to copy and paste) and modifies them to a single format. So far this is what I've accomplished:

import re,pyperclip

dateRegex_0 = re.compile(r'''(
    #0) 3/14/2015
        (\d{1,2})             
        (-|\/|\.)           
        (\d{2})
        (-|\/|\.) 
        (\d{4})
        )''',re.VERBOSE)

dateRegex_1 = re.compile(r'''(
    #1)03-14-2015
        (\d{2})             
        (-|\/|\.)           
        (\d{2})
        (-|\/|\.) 
        (\d{4})
        )''',re.VERBOSE)

dateRegex_2 = re.compile(r'''(
    #2)2015/3/14 , format YYYY/MM/DD
        (\d{4})             
        (-|\/|\.)           
        (\d{1,2})
        (-|\/|\.) 
        (\d{1,2})
        )''',re.VERBOSE)

text=str(pyperclip.paste())
matches = []
for groups in dateRegex_0.findall(text):
        cleanDate = '-'.join([groups[3],groups[1],groups[5]])
        matches.append(cleanDate)

for groups in dateRegex_1.findall(text):

        cleanDate = '-'.join([groups[3],groups[1],groups[5]])
        matches.append(cleanDate)

for groups in dateRegex_2.findall(text):
        cleanDate = '-'.join([groups[5],groups[3],groups[1]])
        matches.append(cleanDate)


if len(matches)>0:
    pyperclip.copy('\n'.join(matches))
    print('Copied to clipboard:')
    print('\n'.join(matches))
else:
    print('There are no dates in your text!')

I managed to create a regex for each date type, and the code transforms the data to this format: DD-MM-YYYY. However I have 2 problems:

  1. When I try to clean this type of date: 3/14/2015, 03-14-2015 I get this output:

    14-3-2015 , 14-03-2015. I want to get rid of that 0 that appears before the single digit months, or add a 0 before everyone of them (basically I want all of my cleaned dates to have the same format).

  2. How can I write a Regex for my date types that doesn't require 3 separate ones? I want a single Regex to identify all of the date types(instead of having dateRegex_0, dateRegex_1, dateRegex_2).



Solution 1:[1]

One idea...

import re
#pip install dateparser (if required)
import dateparser

# quite crude pattern; just 1-4 number, then either / or -, then repeated a couple of times
pattern = r'(\d{1,4}(?:/|-)\d{1,4}(?:/|-)\d{1,4})'

# this is just seen as text (could be from the clipboard)...
data = '''
import dateparser

dates = ['1/14/2016', '05-14-2017', '2014/3/18', '2015-06-14 00:00:00', '13-13-2000000']

for date in dates:
    print(dateparser.parse(date))
'''

# pull out a list of dates matching the above pattern to a list
extracted_dates = re.findall(pattern, data)

# print out the matched strings if dateparser thinks they are a date 
# '3-13-2000000' would match the regex but for dateparser it returns None
for date in extracted_dates:
    if dateparser.parse(date) is not None:
        print(dateparser.parse(date))

Outputs:

2016-01-14 00:00:00
2017-05-14 00:00:00
2014-03-18 00:00:00
2015-06-14 00:00:00

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1