'how to convert dtype='datetime64[ns]' to float?

I am practicing linear regression and here I am passing dates as input x and expecting an output y(float)

x = df[('Date')].values
x = x.reshape(-1, 1)
y= df[('MeanTemp')].values #MeanTemp column has float values
y = y.reshape(-1, 1)

and when I print x, the output is:

array([['1942-07-01T00:00:00.000000000'],
       ['1942-07-02T00:00:00.000000000'],
       ['1942-07-03T00:00:00.000000000'],
       ['1942-07-04T00:00:00.000000000'],
       ['1942-07-05T00:00:00.000000000'],
       ['1942-07-06T00:00:00.000000000'],
       ['1942-07-07T00:00:00.000000000'],
       ['1942-07-08T00:00:00.000000000'],
       ['1942-07-09T00:00:00.000000000'],
       ['1942-07-10T00:00:00.000000000']], dtype='datetime64[ns]')

Now, when I use linear regression

linlin = LinearRegression()
linlin.fit(x, y)

It does not give any error but when I write

linlin.predict(x)


TypeError: The DTypes <class 'numpy.dtype[float64]'> and <class 'numpy.dtype[datetime64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is `object`.

the above TypeError pops up. How do I convert this data type to float so that the predict function works correctly?



Solution 1:[1]

You can use, from numpy, the timedelta of a date in days compared to the min date like so :

>>> import numpy as np

>>> df['date_delta'] = (df['Date'] - df['Date'].min())  / np.timedelta64(1,'D')
>>> x = df['date_delta'].values

Or you can transform the date in floating point representation using the following function :

>>> import numpy as np
>>> import pandas as pd

>>> def dt64_to_float(dt64):
...     year = dt64.astype('M8[Y]')
...     days = (dt64 - year).astype('timedelta64[D]')
...     year_next = year + np.timedelta64(1, 'Y')
...     days_of_year = (year_next.astype('M8[D]') - year.astype('M8[D]')).astype('timedelta64[D]')
...     dt_float = 1970 + year.astype(float) + days / (days_of_year)
...     return dt_float

>>> df['date_float'] = dt64_to_float(df['Date'].to_numpy())
>>> x = df['date_float'].values

Solution 2:[2]

Just cast both x and y as float64.

x = df[('Date')].values.astype("float64")
y = df['MeanTemp'].values.astype("float64")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 ?aky