'Pandas implement an any check
How do I check a Pandas column for "any" row that matches a condition? (in my case, I want to test for type string).
Background: I was using the df.columnName.dtype.kind == 'O' to check for strings. But then I encountered the issue where some of my columns had decimal values. So I am looking for a different way to check and what I have come up with is:
display(df.col1.apply(lambda x: isinstance(x,str)).any()) #true
But the above code causes isinstance to be evaluated on every row and that seems inefficient, if I have a very large number of rows. How can I implement the above check, such that it stops evaluating further after encountering the first true value.
here is a more complete example:
from decimal import *
import pandas as pd
data = {
'c1': [None,'a','b'],
'c2': [None,1,2],
'c3': [None,Decimal(1),Decimal(2)]
}
dx = pd.DataFrame(data)
print(dx) #displays the dataframe
print('dx.dtypes')
print(dx.dtypes) #displays the datatypes in the dataframe
print('dx.c1.dtype:',dx.c1.dtype) #'O'
print('dx.c2.dtype:',dx.c2.dtype) #'float64'
print('dx.c3.dtype:',dx.c3.dtype) #'O'!
print('dx.c1.apply(lambda x: isinstance(x,str)')
print(dx.c1.apply(lambda x: isinstance(x,str)).any())#true
print('dx.c2.apply(lambda x: isinstance(x,str)).any()')
print(dx.c2.apply(lambda x: isinstance(x,str)).any())#false
#the following line shows that the apply function applies it to every row
print('dx.c1.apply(lambda x: isinstance(x,str))')
print(dx.c1.apply(lambda x: isinstance(x,str))) #false,false,false
#and only after that is the any function applied
print('dx.c1.apply(lambda x: isinstance(x,str)).any()')
print(dx.c1.apply(lambda x: isinstance(x,str)).any())#true
The above code outputs:
c1 c2 c3
0 None NaN None
1 a 1.0 1
2 b 2.0 2
dx.dtypes
c1 object
c2 float64
c3 object
dtype: object
dx.c1.dtype: object
dx.c2.dtype: float64
dx.c3.dtype: object
dx.c1.apply(lambda x: isinstance(x,str)
True
dx.c2.apply(lambda x: isinstance(x,str)).any()
False
dx.c1.apply(lambda x: isinstance(x,str))
0 False
1 True
2 True
Name: c1, dtype: bool
dx.c1.apply(lambda x: isinstance(x,str)).any()
True
Is there a better way?
More detail: I am trying to fix this line, which breaks when the column has "decimal" values: https://github.com/capitalone/datacompy/blob/8a74e60d26990e3e05d5b15eb6fb82fef62f4776/datacompy/core.py#L273
Solution 1:[1]
Copying my comment as an answer:
It seems what you needed was the built-in function any:
any(isinstance(x,str) for x in df['col1'])
That way rows are only evaluated until an instance of string is found.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Tranbi |