'Deleting Rows in Dataframe After Exploding in Pandas [duplicate]

I have a dataframe that originally looked like this:

|student_name|subject                    |
|------------|---------------------------|
|smith       |['maths', 'english']       |
|jones       |['maths', 'english']       |
|alan        |['art', 'maths', 'english']|

I used explode to get the following table:

|student_name|subject|
|------------|-------|
|smith       |maths  |
|smith       |english|
|jones       |maths  |
|jones       |english|
|alan        |art    |
|alan        |maths  |
|alan        |english|

I then reset the index as I want to delete all rows containing the string 'maths'. However, instead of just deleting the rows containing maths it deletes all rows as if they hadn't been exploded/reindexed.

Here's my code:

student_df = pd.DataFrame(data)
student_df = student_df.explode('subject')
student_df = student_df.reset_index(drop=True)
student_df = student_df[student_df["subject"].str.contains("maths") == False]

What am I doing wrong?



Solution 1:[1]

The ideal way to do this is to avoid multiple assignments and to use a pipeline.

A few remarks:

  • You can pass a function/lambda to loc to refer to the dataframe itself.
  • Use ~ to invert the value of str.contains.
  • if you want to check for exact match, do not use str.contains but eq/ne (equal/not equal).
student_df2 = (student_df
 .explode('subject')
 .loc[lambda d: ~d['subject'].str.contains("maths")]
)

output:

  student_name  subject
0        smith  english
1        jones  english
2         alan      art
2         alan  english

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mozway