'Pandas: Check if row exists with certain values
I have a two dimensional (or more) pandas DataFrame like this:
>>> import pandas as pd
>>> df = pd.DataFrame([[0,1],[2,3],[4,5]], columns=['A', 'B'])
>>> df
A B
0 0 1
1 2 3
2 4 5
Now suppose I have a numpy array like np.array([2,3]) and want to check if there is any row in df that matches with the contents of my array. Here the answer should obviously true but eg. np.array([1,2]) should return false as there is no row with both 1 in column A and 2 in column B.
Sure this is easy but don't see it right now.
Solution 1:[1]
an easier way is:
a = np.array([2,3])
(df == a).all(1).any()
Solution 2:[2]
If you also want to return the index where the matches occurred:
index_list = df[(df['A'] == 2)&(df['B'] == 3)].index.tolist()
Solution 3:[3]
An answer that works with larger dataframes so you don't need to manually check for each columns:
import pandas as pd
import numpy as np
#define variables
df = pd.DataFrame([[0,1],[2,3],[4,5]], columns=['A', 'B'])
a = np.array([2,3])
def check_if_np_array_is_in_df(df, a):
# transform a into a dataframe
da = pd.DataFrame(np.expand_dims(a,axis=0), columns=['A','B'])
# drop duplicates from df
ddf=df.drop_duplicates()
result = pd.concat([ddf,da]).shape[0] - pd.concat([ddf,da]).drop_duplicates().shape[0]
return result
print(check_if_np_array_is_in_df(df, a))
print(check_if_np_array_is_in_df(df, [1,3]))
Solution 4:[4]
To find rows where a single column equals a certain value:
df[df['column name'] == value]
To find rows where multiple columns equal different values, Note the inner ():
df[(df["Col1"] == Value1 & df["Col2"] == Value2 & ....)]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | acushner |
| Solution 2 | sparrow |
| Solution 3 | Yannick Pezeu |
| Solution 4 | moeabdol |
