'How can I use regex (based on values in a list) to extract values in a Pandas DataFrame?

This is my DataFrame pd.

Product Sales Receipts
Paint 1000 400
Black paint 2000 300
White piant 3000 200
Orange pint 4000 100
Red wallpaper 4000 100
Green wall 4000 100

This is my code

list = ["paint", "pint", "piant"]
rgx_pd = re.compile ('|'.join(list))

How can I use the values in the list to create a new dataframe based on pd, but one with all products matching the values (pdt) in the list and one without (pdf)?



Solution 1:[1]

You can use pandas.Series.str.contains method.

Considering that your dataframe is named as pd like you said in the question:

pd[pd.Product.str.contains(rgx_pd, case=False, regex=True)]

The problem is that you can't use a compiled regex expression with this method, so you're going need to change this line:

rgx_pd = re.compile('|'.join(list))

With:

rgx_pd = r'|'.join(list)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1