'How can I use regex (based on values in a list) to extract values in a Pandas DataFrame?

This is my DataFrame pd.

Product	Sales	Receipts
Paint	1000	400
Black paint	2000	300
White piant	3000	200
Orange pint	4000	100
Red wallpaper	4000	100
Green wall	4000	100

This is my code

list = ["paint", "pint", "piant"]
rgx_pd = re.compile ('|'.join(list))

How can I use the values in the list to create a new dataframe based on pd, but one with all products matching the values (pdt) in the list and one without (pdf)?

Solution 1:^[1]

You can use pandas.Series.str.contains method.

Considering that your dataframe is named as pd like you said in the question:

pd[pd.Product.str.contains(rgx_pd, case=False, regex=True)]

The problem is that you can't use a compiled regex expression with this method, so you're going need to change this line:

rgx_pd = re.compile('|'.join(list))

With:

rgx_pd = r'|'.join(list)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'How can I use regex (based on values in a list) to extract values in a Pandas DataFrame?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]