'Deleting Rows in Dataframe After Exploding in Pandas [duplicate]
I have a dataframe that originally looked like this:
|student_name|subject |
|------------|---------------------------|
|smith |['maths', 'english'] |
|jones |['maths', 'english'] |
|alan |['art', 'maths', 'english']|
I used explode to get the following table:
|student_name|subject|
|------------|-------|
|smith |maths |
|smith |english|
|jones |maths |
|jones |english|
|alan |art |
|alan |maths |
|alan |english|
I then reset the index as I want to delete all rows containing the string 'maths'. However, instead of just deleting the rows containing maths it deletes all rows as if they hadn't been exploded/reindexed.
Here's my code:
student_df = pd.DataFrame(data)
student_df = student_df.explode('subject')
student_df = student_df.reset_index(drop=True)
student_df = student_df[student_df["subject"].str.contains("maths") == False]
What am I doing wrong?
Solution 1:[1]
The ideal way to do this is to avoid multiple assignments and to use a pipeline.
A few remarks:
- You can pass a function/lambda to
loc
to refer to the dataframe itself. - Use
~
to invert the value ofstr.contains
. - if you want to check for exact match, do not use
str.contains
buteq
/ne
(equal/not equal).
student_df2 = (student_df
.explode('subject')
.loc[lambda d: ~d['subject'].str.contains("maths")]
)
output:
student_name subject
0 smith english
1 jones english
2 alan art
2 alan english
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | mozway |