'Filtering a Panda DF based on two possible values in a column
so I have a df that looks like this:
Created UserID Service
1/1/2016 a CWS
1/2/2016 a Other
3/5/2016 a Drive
2/7/2017 b Enhancement
... ... ...
I want to filter it based on values in the "Service" column for CWS and Drive. I did it like this:
df=df[(df.Service=="CWS") or (df.Service=="Drive")]
It's not working. Any ideas?
Solution 1:[1]
Need bit wise comparing with | (or):
df=df[(df.Service=="CWS") | (df.Service=="Drive")]
Better is use isin:
df=df[(df.Service.isin(["CWS", "Drive")]])
Or use query:
df = df.query('Service=="CWS" | Service=="Drive"')
Or query with list:
df = df.query('Service== ["Other", "Drive"]')
print (df)
Created UserID Service
1 1/2/2016 a Other
2 3/5/2016 a Drive
Solution 2:[2]
You can also use pandas.Series.str.match
df[df.Service.str.match('CWS|Drive')]
Created UserID Service
0 1/1/2016 a CWS
2 3/5/2016 a Drive
Other Flavors
For Fun!!
numpy-fi
s = df.Service.values
c1 = s == 'CWS'
c2 = s == 'Drive'
df[c1 | c2]
add numexpr
import numexpr as ne
s = df.Service.values
c1 = s == 'CWS'
c2 = s == 'Drive'
df[ne.evaluate('c1 | c2')]
Timingisin is the winner! str.match is the loser :-(
np.random.seed([3,1415])
df = pd.DataFrame(dict(
Service=np.random.choice(['CWS', 'Drive', 'Other', 'Enhancement'], 100000)))
%timeit df[(df.Service == "CWS") | (df.Service == "Drive")]
%timeit df[df.Service.isin(["CWS", "Drive"])]
%timeit df.query('Service == "CWS" | Service == "Drive"')
%timeit df.query('Service == ["Other", "Drive"]')
%timeit df.query('Service in ["Other", "Drive"]')
%timeit df[df.Service.str.match('CWS|Drive')]
100 loops, best of 3: 16.7 ms per loop
100 loops, best of 3: 4.46 ms per loop
100 loops, best of 3: 7.74 ms per loop
100 loops, best of 3: 5.77 ms per loop
100 loops, best of 3: 5.69 ms per loop
10 loops, best of 3: 67.3 ms per loop
%%timeit
s = df.Service.values
c1 = s == 'CWS'
c2 = s == 'Drive'
df[c1 | c2]
100 loops, best of 3: 5.47 ms per loop
%%timeit
import numexpr as ne
s = df.Service.values
c1 = s == 'CWS'
c2 = s == 'Drive'
df[ne.evaluate('c1 | c2')]
100 loops, best of 3: 5.65 ms per loop
Solution 3:[3]
Since the top answer has wrong syntax for isin method, and the edit queue is full:
df=df[(df.Service.isin(["CWS", "Drive"]))]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 | Dimanjan |
