'Pandas: filter dataframe with type of data

I have dataframe. It's a part

        member_id event_duration             domain           category
0          299819             17  element.yandex.ru               None
1          299819              0        mozilla.org          Программы
2          299819              4          vbmail.ru               None
3          299819              aaa          vbmail.ru               None

How filter df with type? Usually I do it with str.contains, maybe it's normal to specify any like df[df.event_duration.astype(int) == True]?



Solution 1:[1]

If all the other row values are valid as in they are not NaN, then you can convert the column to numeric using to_numeric, this will convert strings to NaN, you can then filter these out using notnull:

In [47]:
df[pd.to_numeric(df['event_duration'], errors='coerce').notnull()]

Out[47]:
   member_id event_duration             domain   category
0     299819             17  element.yandex.ru       None
1     299819              0        mozilla.org  ?????????
2     299819              4          vbmail.ru       None

This:

df[df.event_duration.astype(int) == True]

won't work as the string will raise an ValueError exception as the string cannot be converted

Solution 2:[2]

You can use df.select_dtypes().

df.select_dtypes("int")

Solution 3:[3]

You can use regex as well.

df[df["event_duration"].str.contains(r"^\d+$")]

Solution 4:[4]

Best_soultion:

df["event_duration"].transform(lambda x: x.fillna('') if x.dtype == 'float64' else x.float64(0))

df["event_duration"].transform(lambda x: x.replace('orange','5') if x.dtype == 'object' else x.fillna(0))

You can find all different str set in interger column.

s= set([x for x in df["event_duration"] if type(x).__name__ == "str"])
s

for ex. output:

apple
mango

Then you can filter it out like

df[df["event_duration"]!='apple'] 
#or 
df[df["event_duration"].isin(s)==False] #or True for reverse

or coerce the error, you can do something like this

df["event_duration"] = pd.to_numeric(df["event_duration"], errors='coerce')

Solution 5:[5]

Some of the above answers seem overly complex. In most instances this should work where there are mixed datatypes in a column:

df[df['event_duration'].apply(lambda x: isinstance(x, str))]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 EdChum
Solution 2 Vaasha
Solution 3 vks
Solution 4
Solution 5 DavidWalker