'Leave -One-out kfold for a linear regression in Python
I am trying to run a leave-one-one kfold validation on a linear regression model I have but keep getting errors with my script leaving with nan values at the end. x7 is my true values and y7 is my modeled values. Why do I keep getting an error at the end?
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
x7 =np.array([16.36,24.67,52.31,87.31,3.98,63.45,40.47,35.67,52.12,9.39,57.61,35.77,113.1])
a=np.reshape(x7, (-1,1))
y7 = np.array([19.678974,4.824257,75.617537,62.587548,40.287506,76.576852,38.777129,29.062245
,50.088907,34.415783,46.466144,44.848378,68.988740])
b=np.reshape(y7, (-1,1))
a_train, a_test, b_train, b_test = train_test_split(x7, y7, test_size=12,
random_state=None)
train_test_split(b, shuffle=True)
kfolds = KFold(n_splits=13, random_state=None)
model = LinearRegression()
score = cross_val_score(model, a, b, cv=kfolds)
print(score)
Solution 1:[1]
If you run it, you will see the error:
UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
When you don't provide the metric, it defaults to the default scorer for LinearRegression, which is R^2. R^2 cannot be calculated for just 1 sample.
In your case, check out the options and decide which one is suitable. one option might be to use RMSE (here it is the negative of RMSE) :
score = cross_val_score(model, a, b, cv=kfolds,scoring ="neg_mean_squared_error")
score
array([ -191.24253413, -1196.96087661, -849.60502864, -17.24243385,
-371.71996402, -623.67802306, -21.95720802, -163.79409063,
-2.16490531, -62.32600883, -29.3290439 , -19.44669535,
-315.64087633])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | StupidWolf |
