'ValueError when printing model summary in Linear Regression using sklearn
I want to print the model summary of my fitted simple linear regression. However, I am getting an error. Here is the coding:
#importing libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import statsmodels.api as sm
data = pd.read_csv("boston.csv")
#independent variables
x = data.drop(["MEDV"], axis=1)
#dependent variable. This is the one I will use
y = data[["MEDV"]]
#Creating dummy variables for CHAS, which is a "yes/no" variable, in this case, "0/1" (I transformed it)
y = pd.get_dummies (data, columns=["CHAS"], drop_first=True)
#Creating a constant to the independent variables
x = sm.add_constant (x)
#spliting the data
x_train, x_test, y_train, y_test = train_test_split (
x,y,test_size=0.30, random_state=1
)
#fitting a model
model = sm.OLS(y_train, x_train).fit()
# let's print the regression summary
print(model.summary())
Then, I get this error:
ValueError: shapes (354,13) and (354,13) not aligned: 13 (dim 1) != 354 (dim 0)
What can I do about it? Nothing that I have read on the Internet has helped yet. I have already run these lines of code for other exercises previously and I did not have a problem so I have no clue where the error is coming from. I will also attach screenshoot of full issue explanation by Python.

Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
