'ValueError when printing model summary in Linear Regression using sklearn

I want to print the model summary of my fitted simple linear regression. However, I am getting an error. Here is the coding:

#importing libraries
import numpy as np
import pandas as pd
from  sklearn.model_selection import train_test_split
import statsmodels.api as sm
data = pd.read_csv("boston.csv")

#independent variables
x = data.drop(["MEDV"], axis=1)
#dependent variable. This is the one I will use
y = data[["MEDV"]]

#Creating dummy variables for CHAS, which is a "yes/no" variable, in this case, "0/1" (I transformed it)
y = pd.get_dummies (data, columns=["CHAS"], drop_first=True)
#Creating a constant to the independent variables
x = sm.add_constant (x)

#spliting the data
x_train, x_test, y_train, y_test = train_test_split (
    x,y,test_size=0.30, random_state=1
)
#fitting a model
model = sm.OLS(y_train, x_train).fit()

# let's print the regression summary
print(model.summary())

Then, I get this error:

ValueError: shapes (354,13) and (354,13) not aligned: 13 (dim 1) != 354 (dim 0)

What can I do about it? Nothing that I have read on the Internet has helped yet. I have already run these lines of code for other exercises previously and I did not have a problem so I have no clue where the error is coming from. I will also attach screenshoot of full issue explanation by Python.

ValueError

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'ValueError when printing model summary in Linear Regression using sklearn

Sources

Related Questions