'Linear Regression ValueError: Input contains NaN, infinity or a value too large for dtype('float64')
I'm working on a linear regression model and I'm getting the error:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64')
Here's my code:
### List Column Data Types for df
# Convert "Paid' column to float64 by first changing NaN to 0
Training_Data['Paid'].fillna(0).astype(float)
# Convert 'Sale Price' column to float64 by first changing NaN to 0
#print(df.loc[pd.to_numeric(df['Sale Price'], errors='coerce').isnull()])
#pd.to_numeric(df['Sale Price']).astype(int)
Training_Data["Sale Price"] = Training_Data["Sale
Price"].astype(str).str.strip().replace("",0).astype(float)
# List Data Types
Training_Data.dtypes
Which returns: Paid float64 Sale Price float64 dtype: object
### List Column Data Types for df2
# Convert "Paid' column to float64 by first changing NaN to 0
Test_Data['Paid'].fillna(0).astype(float)
# Convert 'Sale Price' column to float64 by first changing NaN to 0
#print(df.loc[pd.to_numeric(df['Sale Price'], errors='coerce').isnull()])
#pd.to_numeric(df['Sale Price']).astype(int)
Test_Data["Sale Price"] = Test_Data["Sale
Price"].astype(str).str.strip().replace("",0).astype(float)
# List Data Types
Test_Data.dtypes
Which returns: Paid float64 Sale Price float64 dtype: object
### Declare and Drop Dependent (Measured) Variable
SourceData_train_independent = Training_Data.drop(['Sale Price'], axis = 1) #
Drop depedent variable from training dataset
SourceData_train_dependent = Training_Data['Sale Price'].copy() # New dataframe
with only Dependent variable value for training dataset
SourceData_test_independent = Test_Data.drop(['Sale Price'], axis = 1)
SourceData_test_dependent = Test_Data['Sale Price'].copy()
SourceData_train_independent.dtypes
Which returns: Paid float64 dtype: object
### Scaling Independent Train and Test Variable
sc_X = StandardScaler()
X_train = sc_X.fit_transform(SourceData_train_independent.values) #scale the
independent variables
y_train = SourceData_train_dependent # scaling is not required for dependent
variable
X_test = sc_X.transform(SourceData_test_independent)
y_test = SourceData_test_dependent
Finally, when I run:
### Feeding Train Data
reg = LinearRegression().fit(X_train, y_train)
print("The Linear regression score on training data is ",
round(reg.score(X_train, y_train),2))
I get the error. So I'm thinking my file still has NaN values, which I thought I had corrected. Can anyone help? Thanks!
Solution 1:[1]
try this
def check_nan_inf(df):
for col in df.columns:
if df[col].isnull().any():
print(col, 'has nan')
if np.isinf(df[col]).any():
print(col, 'has inf')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Kyriakos |
