'TypeError: '<=' not supported between instances of 'str' and 'float'

I want to find the number of rows of clin dataframe where the OS_MONTHS value is <= 12.0. The values in the OS_MONTHS are float.

This seems like a trivial question.

import pandas as pd

len(clin["OS_MONTHS"] <= 12.0)

Traceback:

TypeError: '<=' not supported between instances of 'str' and 'float'

Data type:

type(clin["OS_MONTHS"])
pandas.core.series.Series

Dataframe

SEX KPS A header AGE OS_MONTHS
0 1 80 44 1 11.76
1 0 100 50 1 4.73
2 1 80 40 1 23.16
3 1 80 61 1 10.58
4 1 80 20 1 35.38


Solution 1:[1]

clin["OS_MONTHS"].astype(float) <= 12.0

if you want to get length:

(clin["OS_MONTHS"].astype(float) <= 12.0).value_counts()

or

s = clin["OS_MONTHS"]
len(s[s.astype(float) <= 1.5])

get your data unique values: unique(), there are some values that are not in float format, and you must handle theme in a manner... for example:

clin["OS_MONTHS"][clin["OS_MONTHS"] != '[Not Available]']

Solution 2:[2]

Check this out:

clin["OS_MONTHS"][~clin["OS_MONTHS"].str.replace('.','').str.isdigit()] = float('NaN')


# Then you can apply @MoRe's solution
clin["OS_MONTHS"].astype(float) <= 12.0

Solution 3:[3]

You could try ._convert(numeric=True) . Unlike .astype(float), this will transform to NaN all values it couldn't convert to floats.

So that would be:

len(clin[clin["OS_MONTHS"]._convert(numeric=True)<= 12.0])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 Daniel Weigel