'Python: keep only letters in string
What is the best way to remove all characters from a string that are not in the alphabet? I mean, remove all spaces, interpunction, brackets, numbers, mathematical operators..
For example:
input: 'as32{ vd"s k!+'
output: 'asvdsk'
Solution 1:[1]
You could use re
, but you don't really need to.
>>> s = 'as32{ vd"s k!+'
>>> ''.join(x for x in s if x.isalpha())
'asvdsk'
>>> filter(str.isalpha, s) # works in python-2.7
'asvdsk'
>>> ''.join(filter(str.isalpha, s)) # works in python3
'asvdsk'
Solution 2:[2]
If you want to use regular expression, This should be quicker
import re
s = 'as32{ vd"s k!+'
print re.sub('[^a-zA-Z]+', '', s)
prints
'asvdsk'
Solution 3:[3]
Here is a method that uses ASCII ranges to check whether an character is in the upper/lower case alphabet (and appends it to a string if it is):
s = 'as32{ vd"s k!+'
sfiltered = ''
for char in s:
if((ord(char) >= 97 and ord(char) <= 122) or (ord(char) >= 65 and ord(char) <= 90)):
sfiltered += char
The variable sfiltered
will show the result, which is 'asvdsk'
as expected.
Solution 4:[4]
This simple expression get all letters, including non ASCII letters ok t áàãéèêç?... and many more used in several languages.
r"[^\W\d]+"
It means "get a sequence of one or more characters that are not either "non word characters" or a digit.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | nehem |
Solution 3 | Patrick Yu |
Solution 4 | plpsanchez |