'Use NaN for values that can't be cast using astype
I have a very large Pandas DataFrame that looks like this:
>>> d = pd.DataFrame({"a": ["1", "U", "3.4"]})
>>> d
a
0 1
1 U
2 3.4
Currently the column is set as an object:
>>> d.dtypes
a object
dtype: object
I'd like to convert this column to float so that I can use groupby() and compute the mean. When I try it using astype I correctly get an error because of the string that can't be cast to float:
>>> d.a.astype(float)
ValueError: could not convert string to float: 'U'
What I'd like to do is to cast all the elements to float, and then replace the ones that can't be cast by NaNs.
How can I do this?
I tried setting raise_on_error, but it doesn't work, the dtype is still object.
>>> d.a.astype(float, raise_on_error=False)
0 1
1 U
2 3.4
Name: a, dtype: object
Solution 1:[1]
Use to_numeric and specify errors='coerce' to force strings that can't be parsed to a numeric value to become NaN:
>>> pd.to_numeric(d['a'], errors='coerce')
0 1.0
1 NaN
2 3.4
Name: a, dtype: float64
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
