'check if column of strings contain a word in a list of string and extract the words in python

I have a DataFrame, and a list of key words, how can I extract matched words from the Text in the DataFrame. Can anyone help? Thank you!

** DataFrame**

df = pd.DataFrame({'ID':range(1,6), 'text':['red blue', 'bbb', 'rrrr blue', 'yyy b', 'ed yye']})

enter image description here

** key word list **

kword = ['red', 'rrrr']

I have tried following:

keyword = r"keyword.csv"
kword = pd.read_csv(keyword , encoding_errors='ignore')
Wrd_list = kword.values.tolist()
pattern = '|'.join(str(v) for v in Wrd_list)

filename = r"text.csv"
data = pd.read_csv(filename, encoding_errors='ignore')
df = pd.DataFrame(data, columns=["id", "Text"])
df['Match_Word'] = df['Text'].str.extract(f"({'|'.join(pattern)})")

but the output only kept the first letter, I tried to use extractall function, it gave an error message:

0  R
1 
2  R
3  
4 
5

My desired output should be:

0 red
1 
2 rrrr
3
4
5


Solution 1:[1]

Your code works fine. I think your issue is that you are getting wrong keyword pattern. Try adding header=None to the kword csv.

import pandas as pd
keyword = "np-match/keyword.csv"
kword = pd.read_csv(keyword, encoding_errors="ignore", header=None)
Wrd_list = kword.values.tolist()
pattern = "|".join(str(v) for v in Wrd_list)

pattern = ["red", "rrr"]
filename = "np-match/text.csv"
data = pd.read_csv(filename, encoding_errors="ignore")
df = pd.DataFrame(data, columns=["id", "Text"])
df["Match_Word"] = df["Text"].str.extract(f"({'|'.join(pattern)})")


   id      Text Match_Word
0   1  red blue        red
1   2      bbbb        NaN
2   3  rrr blue        rrr
3   4      yyyy        NaN

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 makaramkd