'Use regex to search for a phrase where the last word has a maximum number of characters
So I have a list of words below:
list = ['cyber attacks, 28','cyber attacks. A', 'cyber attacks, intrusions', 'cyber attacks; 3','cyber attack. Our', 'cyber intrusions, data']
What I want to do is to remove the phrases in the list if the third word has more than three characters. So the final list would be:
new_list = ['cyber attacks, 28','cyber attacks. A', 'cyber attacks; 3','cyber attack. Our']
This is what I have so far but it also includes the phrases where the last word is more than three characters:
new_list = []
for phrase in list:
max_three_char = re.match('cyber\s\w{1,}(\.|,|;|\)|\/|:|"|])\s\w{,3}', phrase)
if max_three_char:
new_list.append(phrase)
Solution 1:[1]
I would do:
import re
li = ['cyber attacks, 28','cyber attacks. A', 'cyber attacks, intrusions', 'cyber attacks; 3','cyber attack. Our', 'cyber intrusions, data']
>>> [s for s in li if re.search(r'(?<=\W)\w{1,3}$', s)]
['cyber attacks, 28', 'cyber attacks. A', 'cyber attacks; 3', 'cyber attack. Our']
Or if you can count on have a space delimiter:
>>> [s for s in li if len(s.split()[-1])<=3]
# same
Solution 2:[2]
You could use a list comprehension as in
import re
lst = ['cyber attacks, 28','cyber attacks. A', 'cyber attacks, intrusions', 'cyber attacks; 3','cyber attack. Our', 'cyber intrusions, data']
pattern = re.compile(r'[, ]+')
new_lst = [item
for item in lst
for splitted in [pattern.split(item)]
if not (len(splitted) > 2 and len(splitted[2]) > 3)]
print(new_lst)
Which would yield
['cyber attacks, 28', 'cyber attacks. A', 'cyber attacks; 3', 'cyber attack. Our']
Don't name your variables after built-in things like list, etc.
Solution 3:[3]
No need for regex, you can use string.split:
if len(my_phrase.split()[2]) <=3:
//process my_phrase
This works since there are spaces between the words.
Solution 4:[4]
Since your separator is space, you don't need regex and you can do with python standard method string.split().
ls = ['cyber attacks, 28', 'cyber attacks. A', 'cyber attacks, intrusions', 'cyber attacks; 3', 'cyber attack. Our',
'cyber intrusions, data']
def my_filter(i) -> bool:
sub_str = i.split(' ', 2)
if len(sub_str) < 3:
return False
return len(sub_str[2]) <= 3
print([i for i in ls if my_filter(i)])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | dawg |
| Solution 2 | Jan |
| Solution 3 | TDG |
| Solution 4 | iElden |
