'Exclude rows containing a certain string
I have a dataset which looks like:
df.head()
applicationstartdate segment fpd_30 fpd_90 fstpd_30
0 2020-01-01 00:04:10 3a.TBC Payroll with CB 0.0 0.0 0.0
1 2020-01-01 00:04:17 3a.TBC Payroll with CB 0.0 0.0 0.0
2 2020-01-01 00:14:25 1.TBC Payroll with CH (All) 0.0 0.0 0.0
3 2020-01-01 00:31:59 1.TBC Payroll with CH (All) 0.0 0.0 0.0
4 2020-01-01 00:41:49 1.TBC Payroll with CH (All) 0.0 0.0 0.
I want to exclude all the rows containing word "Payroll" in column "segment".
I tried:
df2 = df[~df["segment"].str.contains('Payroll')]
which yielded:
TypeError: bad operand type for unary ~: 'float'
Help would be appreciated.
Solution 1:[1]
You likely have NaNs in your column, you can use:
df2 = df[~df["segment"].fillna('').str.contains('Payroll')]
Or,f if you also want to filter out the NaNs:
df2 = df[~df["segment"].fillna('Payroll').str.contains('Payroll')]
Solution 2:[2]
You can use na = True argument - because you are negating the condition and you want NaN to be filtered.
df2 = df[~df['segment'].str.contains('Payroll', na=True)]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mozway |
| Solution 2 | SomeDude |
