'How do I keep the sentence based on a value in a different column?

There is a text column I want to split based on one other column (filter). In the example below you will see one column that contains a short article, and one column that contains the filter value.

    filter       text   
0   blue|shirt   In the state of alabama it is 35 degrees. We advise a blue shirt. Don't do anything else. Just where a shirt.
1   green|shirt  In the Netherlands you don't use a shirt. It is cold. We use Green sweaters. We advise a blue shirt. 
2   Red|shirt    This news is not good. The colour red makes me angry. 

For the desired outcome i'm looking for a way to get the column below.

    filter       Desired outcome   
0   blue|shirt   We advise a blue shirt. Just where a shirt.
1   green|shirt  In the Netherlands you don't use a shirt. We use Green sweaters. 
2   Red|shirt    The colour red makes me angry. 

I tried a lot of different ways but i didn't succeed. It would be also great if I also have the option to get one sentence before and one sentence after the filter value.

What is the best code to use for this problem?



Solution 1:[1]

We can use split, explode, a bit of regex and groupby:

import re
df['s'] = df['text'].str.split('\. ')
df1 = df.explode('s').drop(columns = 'text')
matched = df1.apply(lambda r: re.search(r['filter'].lower(), r['s'].lower()) is not None, axis=1)
df1[matched].groupby('filter', sort = False).agg('. '.join).reset_index().rename(columns = {'s':'text'})

output

    filter       text
--  -----------  ----------------------------------------------------------------------------------------
 0  blue|shirt   We advise a blue shirt. Just where a shirt.
 1  green|shirt  In the Netherlands you don't use a shirt. We use Green sweaters. We advise a blue shirt.
 2  Red|shirt    The colour red makes me angry.

To get one before and one after, after the matched = ... line add this, which adds one before and one after of each matched sentence

...
matched = matched | matched.groupby(level=0).shift(1) | matched.groupby(level=0).shift(-1)
...

this is not very interesting for your case as this just pulls in all sentences

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 piterbarg