'Theil-Sen Regression: different results when translating x-axis
I want to fit a Theil-Sen regression (using scikit-learn) on a time series. I tried two things:
- fitting the regressor directly on the years (X = {2002:2019}
- fitting the regressor directly on the years - minimum year (X = {0:18} I would have expected the results to be the same, but they are different. If I use an OLS regression, they are indeed similar. What am I missing?
from sklearn.linear_model import TheilSenRegressor, LinearRegression
y = np.array(
[688., 895., 1673., 1077., 855., 1064., 1226., 3900., 699., 699., 2726., 1383., 1542., 2132., 1275., 969., 2789.,
2576.])
X = np.arange(len(y)).reshape(-1, 1)
X2 = X + 2002
y_pred2 = TheilSenRegressor(random_state=0).fit(X2, y).predict(X2)
print(y_pred2)
y_pred = TheilSenRegressor(random_state=0).fit(X, y).predict(X)
import matplotlib.pyplot as plt
fig, axarr = plt.subplots(2, 2)
axarr[0, 0].scatter(X, y)
axarr[0, 0].plot(X, y_pred, color='orange')
axarr[0, 0].title.set_text('Theil-Sen: X')
axarr[0, 1].scatter(X2, y)
axarr[0, 1].plot(X2, y_pred2, color='orange')
axarr[0, 1].title.set_text('Theil-Sen: Shifted X')
axarr[1, 0].scatter(X, y)
axarr[1, 0].plot(X, LinearRegression().fit(X, y).predict(X), color='orange')
axarr[1, 0].title.set_text('OLS: X')
axarr[1, 1].scatter(X2, y)
axarr[1, 1].plot(X2, LinearRegression().fit(X2, y).predict(X2), color='orange')
axarr[1, 1].title.set_text('OLS: Shifted X')
Solution 1:[1]
I'm not sure why it is producing this result. If you are happy to use the scipy.stats.mstats.theilslopes instead it will produce the expected result:
import numpy as np
from scipy.stats.mstats import theilslopes, linregress
import matplotlib.pyplot as plt
Y = np.array(
[688., 895., 1673., 1077., 855., 1064., 1226., 3900., 699., 699., 2726., 1383., 1542., 2132., 1275., 969., 2789.,
2576.])
X1 = np.arange(len(y)).reshape(-1, 1)
X2 = X + 2002
model1 = theilslopes(Y, X1)
model2 = theilslopes(Y, X2)
Y1_pred = model1[1] + model1[0] * X1
Y2_pred = model2[1] + model2[0] * X2
model1lr = linregress(X1, Y)
model2lr = linregress(X2, Y)
Y1lr_pred = model1lr[1] + model1lr[0] * X1
Y2lr_pred = model2lr[1] + model2lr[0] * X2
fig, axarr = plt.subplots(2, 2)
axarr[0, 0].scatter(X1, Y)
axarr[0, 0].plot(X1, Y1_pred, color='orange')
axarr[0, 0].title.set_text('Theil-Sen: X')
axarr[0, 1].scatter(X2, Y)
axarr[0, 1].plot(X2, Y2_pred, color='orange')
axarr[0, 1].title.set_text('Theil-Sen: Shifted X')
axarr[1, 0].scatter(X, Y)
axarr[1, 0].plot(X, Y1lr_pred, color='orange')
axarr[1, 0].title.set_text('OLS: X')
axarr[1, 1].scatter(X2, Y)
axarr[1, 1].plot(X2, Y2lr_pred, color='orange')
axarr[1, 1].title.set_text('OLS: Shifted X')
plt.pause(1)
plt.show(block=True)
I hope this helps, but we still need to figure out what's going on in scikit-learn.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | TristanDJGraham |

