'Python Date cleaner using regex
I've been trying to do a script that takes date inputs like 3/14/2015, 03-14-2015, and 2015/3/14 (using pyperclip to copy and paste) and modifies them to a single format. So far this is what I've accomplished:
import re,pyperclip
dateRegex_0 = re.compile(r'''(
#0) 3/14/2015
(\d{1,2})
(-|\/|\.)
(\d{2})
(-|\/|\.)
(\d{4})
)''',re.VERBOSE)
dateRegex_1 = re.compile(r'''(
#1)03-14-2015
(\d{2})
(-|\/|\.)
(\d{2})
(-|\/|\.)
(\d{4})
)''',re.VERBOSE)
dateRegex_2 = re.compile(r'''(
#2)2015/3/14 , format YYYY/MM/DD
(\d{4})
(-|\/|\.)
(\d{1,2})
(-|\/|\.)
(\d{1,2})
)''',re.VERBOSE)
text=str(pyperclip.paste())
matches = []
for groups in dateRegex_0.findall(text):
cleanDate = '-'.join([groups[3],groups[1],groups[5]])
matches.append(cleanDate)
for groups in dateRegex_1.findall(text):
cleanDate = '-'.join([groups[3],groups[1],groups[5]])
matches.append(cleanDate)
for groups in dateRegex_2.findall(text):
cleanDate = '-'.join([groups[5],groups[3],groups[1]])
matches.append(cleanDate)
if len(matches)>0:
pyperclip.copy('\n'.join(matches))
print('Copied to clipboard:')
print('\n'.join(matches))
else:
print('There are no dates in your text!')
I managed to create a regex for each date type, and the code transforms the data to this format: DD-MM-YYYY. However I have 2 problems:
When I try to clean this type of date:
3/14/2015, 03-14-2015
I get this output:14-3-2015 , 14-03-2015
. I want to get rid of that 0 that appears before the single digit months, or add a 0 before everyone of them (basically I want all of my cleaned dates to have the same format).How can I write a Regex for my date types that doesn't require 3 separate ones? I want a single Regex to identify all of the date types(instead of having dateRegex_0, dateRegex_1, dateRegex_2).
Solution 1:[1]
One idea...
import re
#pip install dateparser (if required)
import dateparser
# quite crude pattern; just 1-4 number, then either / or -, then repeated a couple of times
pattern = r'(\d{1,4}(?:/|-)\d{1,4}(?:/|-)\d{1,4})'
# this is just seen as text (could be from the clipboard)...
data = '''
import dateparser
dates = ['1/14/2016', '05-14-2017', '2014/3/18', '2015-06-14 00:00:00', '13-13-2000000']
for date in dates:
print(dateparser.parse(date))
'''
# pull out a list of dates matching the above pattern to a list
extracted_dates = re.findall(pattern, data)
# print out the matched strings if dateparser thinks they are a date
# '3-13-2000000' would match the regex but for dateparser it returns None
for date in extracted_dates:
if dateparser.parse(date) is not None:
print(dateparser.parse(date))
Outputs:
2016-01-14 00:00:00
2017-05-14 00:00:00
2014-03-18 00:00:00
2015-06-14 00:00:00
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |