'Pandas implement an any check

How do I check a Pandas column for "any" row that matches a condition? (in my case, I want to test for type string).

Background: I was using the df.columnName.dtype.kind == 'O' to check for strings. But then I encountered the issue where some of my columns had decimal values. So I am looking for a different way to check and what I have come up with is:

display(df.col1.apply(lambda x: isinstance(x,str)).any()) #true

But the above code causes isinstance to be evaluated on every row and that seems inefficient, if I have a very large number of rows. How can I implement the above check, such that it stops evaluating further after encountering the first true value.

here is a more complete example:

from decimal import *
import pandas as pd

data = {
        'c1':  [None,'a','b'],
        'c2': [None,1,2],
        'c3': [None,Decimal(1),Decimal(2)]
       }

dx = pd.DataFrame(data)
print(dx) #displays the dataframe
print('dx.dtypes')
print(dx.dtypes) #displays the datatypes in the dataframe

print('dx.c1.dtype:',dx.c1.dtype) #'O'
print('dx.c2.dtype:',dx.c2.dtype) #'float64'
print('dx.c3.dtype:',dx.c3.dtype) #'O'!

print('dx.c1.apply(lambda x: isinstance(x,str)')
print(dx.c1.apply(lambda x: isinstance(x,str)).any())#true
print('dx.c2.apply(lambda x: isinstance(x,str)).any()')
print(dx.c2.apply(lambda x: isinstance(x,str)).any())#false

#the following line shows that the apply function applies it to every row
print('dx.c1.apply(lambda x: isinstance(x,str))')
print(dx.c1.apply(lambda x: isinstance(x,str))) #false,false,false

#and only after that is the any function applied
print('dx.c1.apply(lambda x: isinstance(x,str)).any()')
print(dx.c1.apply(lambda x: isinstance(x,str)).any())#true

The above code outputs:

     c1   c2    c3
0  None  NaN  None
1     a  1.0     1
2     b  2.0     2

dx.dtypes
c1     object
c2    float64
c3     object
dtype: object

dx.c1.dtype: object
dx.c2.dtype: float64
dx.c3.dtype: object

dx.c1.apply(lambda x: isinstance(x,str)
True

dx.c2.apply(lambda x: isinstance(x,str)).any()
False

dx.c1.apply(lambda x: isinstance(x,str))
0    False
1     True
2     True
Name: c1, dtype: bool

dx.c1.apply(lambda x: isinstance(x,str)).any()
True

Is there a better way?

More detail: I am trying to fix this line, which breaks when the column has "decimal" values: https://github.com/capitalone/datacompy/blob/8a74e60d26990e3e05d5b15eb6fb82fef62f4776/datacompy/core.py#L273



Solution 1:[1]

Copying my comment as an answer:

It seems what you needed was the built-in function any:

any(isinstance(x,str) for x in df['col1'])

That way rows are only evaluated until an instance of string is found.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tranbi