'How can I use regex (based on values in a list) to extract values in a Pandas DataFrame?
This is my DataFrame pd.
| Product | Sales | Receipts |
|---|---|---|
| Paint | 1000 | 400 |
| Black paint | 2000 | 300 |
| White piant | 3000 | 200 |
| Orange pint | 4000 | 100 |
| Red wallpaper | 4000 | 100 |
| Green wall | 4000 | 100 |
This is my code
list = ["paint", "pint", "piant"]
rgx_pd = re.compile ('|'.join(list))
How can I use the values in the list to create a new dataframe based on pd, but one with all products matching the values (pdt) in the list and one without (pdf)?
Solution 1:[1]
You can use pandas.Series.str.contains method.
Considering that your dataframe is named as pd like you said in the question:
pd[pd.Product.str.contains(rgx_pd, case=False, regex=True)]
The problem is that you can't use a compiled regex expression with this method, so you're going need to change this line:
rgx_pd = re.compile('|'.join(list))
With:
rgx_pd = r'|'.join(list)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
