'extract whole paragraph containing keyword from pdf of novel
I'm trying to analyze a novel by focusing on passages containing a specific keyword, like paragraphs containing the word "thought" or "night." What I have so far isn't working.
import tika
from tika import parser
parsed = parser.from_file('ethanfrome22.pdf')
keyword = ['thought']
if keyword in parsed["content"]:
print(parsed["content"])'
Solution 1:[1]
I never used tika but from the looks try to lookup the string thought instead of the whole list, e.g. change code to:
import tika
from tika import parser
parsed = parser.from_file('ethanfrome22.pdf')
keywords = ['thought', 'or', 'two']
for keyword in keywords:
if keyword in parsed["content"]:
print(parsed["content"])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | midin |
