'Cannot perform std with type object Dask

performing normal calculation on dask is giving me the error

x_std = x.std().compute()

Computing head:

x.head()

    LocalTime               Ask     Bid
0   2004.10.25 00:01:01.975 86.837  86.877
1   2004.10.25 00:01:19.300 86.791  86.891
2   2004.10.25 00:01:30.759 86.812  86.842
3   2004.10.25 00:01:41.798 86.801  86.831
4   2004.10.25 00:01:42.213 86.794  86.824

Error :

TypeError: cannot perform std with type object

I was doing in accordance with documentation ...



Solution 1:[1]

From the output of x.head(), it can be seen that one of the columns is a datetime column, however without conversion, it's likely stored as an object column. To check dtypes, run:

print(ddf.dtypes)

To convert, use dd.to_datetime as explained in this blog post:

from dask.dataframe import to_datetime

# note this overwrites the original column
ddf["LocalTime"] = to_datetime(ddf["LocalTime"])

If the other two columns, Ask and Bid, are also objects, then another conversion, to numeric, is needed (see this blog post for details):

from dask.dataframe import to_numeric

ddf["Ask"] = to_numeric(ddf["Ask"], errors="coerce")
ddf["Bid"] = to_numeric(ddf["Bid"], errors="coerce")

After conversion, the ddf_std = ddf.std().compute() should work without error.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1