'Is there a way to remove all letters from a string
I have a list of titles with combined dates and descriptions, but I have to reduce this to just a list of dates. Some examples of these titles are stuff like this:
1/16 Stories of Time
5/18 Cock'a'doodle'do
However, some people are really bad at typing and have forgotten the spaces between the dates and the rest of the title. I need to remove everything except for numbers and the slashes between them. Using any method, but preferably regex, is there a simple way to do this? For the record, I do understand how to split and recompile the list for any method that would work on a single string.
Solution 1:[1]
You're thinking about this backwards. If you want to extract the date at the start of a line, do that instead of trying to get rid of everything else.
You can use a regex like this: ^\d{1,2}/\d{1,2} which means:
^start of line\ddigit{1,2}repeated one or two times
For example:
import re
lines = [
'1/16 Stories of Time',
"5/18 Cock'a'doodle'do",
'6/22Bible']
for line in lines:
match = re.match(r'^\d{1,2}/\d{1,2}', line)
if match:
print(match.group(0))
Output:
1/16
5/18
6/22
(Note that re.match always starts matching from the start of the string, so the ^ is redundant here.)
This is more rigorous against titles containing numbers and slashes, like say, 4/5 The 39 Steps / The Thirty-Nine Steps -> 4/5.
However, you'll have a problem if someone forgot the space for a title starts with a number, like say, 7/8100 Years of Solitude -> 7/81.
Solution 2:[2]
You can import string to get easy access to a string of all digits, add the slash to it, and then compare your date string against that to drop any character from the date string that's not in there:
import string
string.digits += "/"
for character in date_string:
if not character in string.digits:
date_string = date_string.replace(character, "")
This will convert the date_string 5/18 Cock'a'doodle'do to just 5/18 without using regex at all.
Solution 3:[3]
Barmar on the comment of the original question had the best answer. To remove all but the numbers and a slash from the string you can use the one line of code,
string = re.sub(r'[^\d/]', '', string)
This removes all letters but ignores slashes. Thank you Barmar, if you want to post this as an answer I can take this down and flag that instead.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Schnitte |
| Solution 3 | Wall Runner |
