'extract whole paragraph containing keyword from pdf of novel

I'm trying to analyze a novel by focusing on passages containing a specific keyword, like paragraphs containing the word "thought" or "night." What I have so far isn't working.

import tika
from tika import parser
parsed = parser.from_file('ethanfrome22.pdf')
keyword = ['thought']
if keyword in parsed["content"]:
    print(parsed["content"])'

Solution 1:^[1]

I never used tika but from the looks try to lookup the string thought instead of the whole list, e.g. change code to:

import tika
from tika import parser

parsed = parser.from_file('ethanfrome22.pdf')
keywords = ['thought', 'or', 'two']

for keyword in keywords:
    if keyword in parsed["content"]:
        print(parsed["content"])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	midin

'extract whole paragraph containing keyword from pdf of novel

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]