'Is there a way to drop rows from a dataframe by comparing a column value to values in list?

I have a dataframe with three columns. I am trying to clean the data by dropping all the rows that do not have country names in the third column. Or basically I am trying to drop all the values in that third column that are not countries.

For that, I added a list of the country names to my notebook and now I would like to know if it is possible to drop all the values in that column that are not found in the list of countries?

The problem might be, that the values in the column are one big string of text from a product text of an HTML file. I already split the strings and dropped a couple of rows based on ";" and length, but now I am not sure how to continue.

I tried:

ProductDataFrame =
ProductDataFrame[~ProductDataFrame['Produkttext'].isin(CountriesList)]

which doesn't return an error but does not change anything in my dataframe...

This is what it looks like:

                                                                  Produkttext  
1                                                             Roter Kopfsalat.   
2                                                                       Italien  
3                                 Äthiopien,Marokko, Senegal, Ruanda oder Kenia  
4                                                                       Spanien  
5                                                                   Deutschland  
6                                           Deutschland, Niederlande oder Polen  
7                                                      Deutschland oder Italien  
8                                          Deutschland, Frankreich oder Italien  
9                                                                   Deutschland  
10                                                  Deutschland oder Österreich


Solution 1:[1]

After you split by ,, you can use explode to flatten out all the sublists, then use isin. When you want to group them back together into lists, use .groupby(level=0) (which groups by the 0th level of the index) and .agg(list):

e = df['Produkttext'].str.split(r'[\s,]+').explode()

# Remove "oder"
e = e[e.ne('oder')]

# Keep only the items that are in CountriesList
e = e[e.isin(CountriesList)]

# Convert the remaining items back to lists
e.groupby(level=0).agg(list)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1