'how to convert dtype='datetime64[ns]' to float?
I am practicing linear regression and here I am passing dates as input x and expecting an output y(float)
x = df[('Date')].values
x = x.reshape(-1, 1)
y= df[('MeanTemp')].values #MeanTemp column has float values
y = y.reshape(-1, 1)
and when I print x, the output is:
array([['1942-07-01T00:00:00.000000000'],
['1942-07-02T00:00:00.000000000'],
['1942-07-03T00:00:00.000000000'],
['1942-07-04T00:00:00.000000000'],
['1942-07-05T00:00:00.000000000'],
['1942-07-06T00:00:00.000000000'],
['1942-07-07T00:00:00.000000000'],
['1942-07-08T00:00:00.000000000'],
['1942-07-09T00:00:00.000000000'],
['1942-07-10T00:00:00.000000000']], dtype='datetime64[ns]')
Now, when I use linear regression
linlin = LinearRegression()
linlin.fit(x, y)
It does not give any error but when I write
linlin.predict(x)
TypeError: The DTypes <class 'numpy.dtype[float64]'> and <class 'numpy.dtype[datetime64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is `object`.
the above TypeError pops up. How do I convert this data type to float so that the predict function works correctly?
Solution 1:[1]
You can use, from numpy, the timedelta of a date in days compared to the min date like so :
>>> import numpy as np
>>> df['date_delta'] = (df['Date'] - df['Date'].min()) / np.timedelta64(1,'D')
>>> x = df['date_delta'].values
Or you can transform the date in floating point representation using the following function :
>>> import numpy as np
>>> import pandas as pd
>>> def dt64_to_float(dt64):
... year = dt64.astype('M8[Y]')
... days = (dt64 - year).astype('timedelta64[D]')
... year_next = year + np.timedelta64(1, 'Y')
... days_of_year = (year_next.astype('M8[D]') - year.astype('M8[D]')).astype('timedelta64[D]')
... dt_float = 1970 + year.astype(float) + days / (days_of_year)
... return dt_float
>>> df['date_float'] = dt64_to_float(df['Date'].to_numpy())
>>> x = df['date_float'].values
Solution 2:[2]
Just cast both x and y as float64.
x = df[('Date')].values.astype("float64")
y = df['MeanTemp'].values.astype("float64")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | ?aky |
