'Find all sentences containing specific words
I have a string consisting of sentences and want to find all sentences that contain at least one specific keyword, i.e. keyword1 or keyword2:
import re
s = "This is a sentence which contains keyword1. And keyword2 is inside this sentence. "
pattern = re.compile(r"([A-Z][^\.!?].*(keyword1)|(keyword2).*[\.!?])\s")
for match in pattern.findall(s):
print(match)
Output:
('This is a sentence which contains keyword1', 'keyword1', '')
('keyword2 is inside this sentence. ', '', 'keyword2')
Expected Output:
('This is a sentence which contains keyword1', 'keyword1', '')
('And keyword2 is inside this sentence. ', '', 'keyword2')
As you can see, the second match doesn't contain the whole sentence in the first group. What am I missing here?
Solution 1:[1]
You can try following regular expression:
[.?!]*\s*(.*(keyword1)[^.?!]*[.?!]|.*(keyword2)[^.?!]*[.?!])
Code:
import re
s = "This is a sentence which contains keyword1. And keyword2 is inside this sentence. "
pattern = re.compile(r"[.?!]*\s*(.*(keyword1)[^.?!]*[.?!]|.*(keyword2)[^.?!]*[.?!])")
for match in pattern.findall(s):
print(match)
Output:
('This is a sentence which contains keyword1.', 'keyword1', '')
('And keyword2 is inside this sentence.', '', 'keyword2')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | hc_dev |
