'Theil-Sen Regression: different results when translating x-axis

I want to fit a Theil-Sen regression (using scikit-learn) on a time series. I tried two things:

  • fitting the regressor directly on the years (X = {2002:2019}
  • fitting the regressor directly on the years - minimum year (X = {0:18} I would have expected the results to be the same, but they are different. If I use an OLS regression, they are indeed similar. What am I missing?

enter image description here

from sklearn.linear_model import TheilSenRegressor, LinearRegression

y = np.array(
    [688., 895., 1673., 1077., 855., 1064., 1226., 3900., 699., 699., 2726., 1383., 1542., 2132., 1275., 969., 2789.,
     2576.])

X = np.arange(len(y)).reshape(-1, 1)

X2 = X + 2002

y_pred2 = TheilSenRegressor(random_state=0).fit(X2, y).predict(X2)
print(y_pred2)

y_pred = TheilSenRegressor(random_state=0).fit(X, y).predict(X)


import matplotlib.pyplot as plt

fig, axarr = plt.subplots(2, 2)


axarr[0, 0].scatter(X, y)
axarr[0, 0].plot(X, y_pred, color='orange')
axarr[0, 0].title.set_text('Theil-Sen: X')

axarr[0, 1].scatter(X2, y)
axarr[0, 1].plot(X2, y_pred2, color='orange')
axarr[0, 1].title.set_text('Theil-Sen: Shifted X')

axarr[1, 0].scatter(X, y)
axarr[1, 0].plot(X, LinearRegression().fit(X, y).predict(X), color='orange')
axarr[1, 0].title.set_text('OLS: X')

axarr[1, 1].scatter(X2, y)
axarr[1, 1].plot(X2, LinearRegression().fit(X2, y).predict(X2), color='orange')
axarr[1, 1].title.set_text('OLS: Shifted X')


Solution 1:[1]

I'm not sure why it is producing this result. If you are happy to use the scipy.stats.mstats.theilslopes instead it will produce the expected result:

import numpy as np
from scipy.stats.mstats import theilslopes, linregress
import matplotlib.pyplot as plt

Y = np.array(
    [688., 895., 1673., 1077., 855., 1064., 1226., 3900., 699., 699., 2726., 1383., 1542., 2132., 1275., 969., 2789.,
     2576.])

X1 = np.arange(len(y)).reshape(-1, 1)

X2 = X + 2002

model1 = theilslopes(Y, X1)
model2 = theilslopes(Y, X2)

Y1_pred = model1[1] + model1[0] * X1
Y2_pred = model2[1] + model2[0] * X2

model1lr = linregress(X1, Y)
model2lr = linregress(X2, Y)

Y1lr_pred = model1lr[1] + model1lr[0] * X1
Y2lr_pred = model2lr[1] + model2lr[0] * X2

fig, axarr = plt.subplots(2, 2)

axarr[0, 0].scatter(X1, Y)
axarr[0, 0].plot(X1, Y1_pred, color='orange')
axarr[0, 0].title.set_text('Theil-Sen: X')

axarr[0, 1].scatter(X2, Y)
axarr[0, 1].plot(X2, Y2_pred, color='orange')
axarr[0, 1].title.set_text('Theil-Sen: Shifted X')

axarr[1, 0].scatter(X, Y)
axarr[1, 0].plot(X, Y1lr_pred, color='orange')
axarr[1, 0].title.set_text('OLS: X')

axarr[1, 1].scatter(X2, Y)
axarr[1, 1].plot(X2, Y2lr_pred, color='orange')
axarr[1, 1].title.set_text('OLS: Shifted X')

plt.pause(1)
plt.show(block=True)

I hope this helps, but we still need to figure out what's going on in scikit-learn.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 TristanDJGraham