'How do I keep the sentence based on a value in a different column?
There is a text column I want to split based on one other column (filter). In the example below you will see one column that contains a short article, and one column that contains the filter value.
filter text
0 blue|shirt In the state of alabama it is 35 degrees. We advise a blue shirt. Don't do anything else. Just where a shirt.
1 green|shirt In the Netherlands you don't use a shirt. It is cold. We use Green sweaters. We advise a blue shirt.
2 Red|shirt This news is not good. The colour red makes me angry.
For the desired outcome i'm looking for a way to get the column below.
filter Desired outcome
0 blue|shirt We advise a blue shirt. Just where a shirt.
1 green|shirt In the Netherlands you don't use a shirt. We use Green sweaters.
2 Red|shirt The colour red makes me angry.
I tried a lot of different ways but i didn't succeed. It would be also great if I also have the option to get one sentence before and one sentence after the filter value.
What is the best code to use for this problem?
Solution 1:[1]
We can use split, explode, a bit of regex and groupby:
import re
df['s'] = df['text'].str.split('\. ')
df1 = df.explode('s').drop(columns = 'text')
matched = df1.apply(lambda r: re.search(r['filter'].lower(), r['s'].lower()) is not None, axis=1)
df1[matched].groupby('filter', sort = False).agg('. '.join).reset_index().rename(columns = {'s':'text'})
output
filter text
-- ----------- ----------------------------------------------------------------------------------------
0 blue|shirt We advise a blue shirt. Just where a shirt.
1 green|shirt In the Netherlands you don't use a shirt. We use Green sweaters. We advise a blue shirt.
2 Red|shirt The colour red makes me angry.
To get one before and one after, after the matched = ... line add this, which adds one before and one after of each matched sentence
...
matched = matched | matched.groupby(level=0).shift(1) | matched.groupby(level=0).shift(-1)
...
this is not very interesting for your case as this just pulls in all sentences
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | piterbarg |
