'How to extract paragraphs containing keyword from pdf of novel?
I am trying to extract paragraphs from a novel containing a particular keyword but this is not running properly. open to any suggestions even alternative programs/modules but am limited to python. code below:
import PyPDF2
import re
pattern = ('.', 'thought', 'night', 'face', '.')
fileName = input('ethanfrome22.pdf')
object = PyPDF2.PdfFileReader(fileName)
numPages = object.getNumPages()
for i in range(0, numPages):
pageObj = object.getPage(i)
text = pageObj.extractText()
for match in re.finditer(pattern, text):
print(f'Page no: {i} | Match: {match}')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
