'Scikit-learn: ValueError: X has 1007 features, but StandardScaler is expecting 1016 features as input
When trying to predict the outcome for a test dataset utilizing the pipeline fitted on a training dataset, I am receiving an error message saying:
Traceback (most recent call last):
File "c:\Users\username\Desktop\Project\PythonTool\calculator\database-analyzer\database_analyzer.py", line 380, in <module>
main()
File "c:\Users\username\Desktop\Project\PythonTool\calculator\database-analyzer\database_analyzer.py", line 349, in main
bestmodel_predictor(train_original, train_data, test_original)
File "c:\Users\username\Desktop\Project\PythonTool\calculator\database-analyzer\utilities_module.py", line 1768, in bestmodel_predictor
test_original[TARGET_VARIABLE] = predictor_model(train_original,
File "c:\Users\username\Desktop\Project\PythonTool\calculator\database-analyzer\utilities_module.py", line 1734, in predictor_model
predictions = np.rint(pipe_abc.predict(test_dataset))
File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\sklearn\utils\metaestimators.py", line 113, in <lambda>
out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs) # noqa
File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\sklearn\pipeline.py", line 469, in predict
Xt = transform.transform(Xt)
File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\sklearn\preprocessing\_data.py", line 973, in transform
X = self._validate_data(
File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\sklearn\base.py", line 585, in _validate_data
self._check_n_features(X, reset=reset)
File "C:\ProgramData\Anaconda3\envs\tf\lib\site-packages\sklearn\base.py", line 400, in _check_n_features
raise ValueError(
ValueError: X has 1007 features, but StandardScaler is expecting 1016 features as input.
Here is a portion of my code including the problematic lines which I am not sure how to fix in order to overcome the mismatch between what StandardScaler takes with the available features of X:
def predictor_model(some_df: pd.DataFrame, test_dataset: list) -> list:
"""This function applied prediction algorithm on test dataset based on learned train dataset
Args:
some_df (pd.DataFrame): _description_
test_dataset (list): _description_
Returns:
list: _description_
"""
pipe_abc = pipeline_maker()
features_final, target_final = featuretarget_separator(some_df)
pipe_abc.fit(features_final, target_final)
predictions = np.rint(pipe_abc.predict(test_dataset))
return predictions
def bestmodel_predictor(train_original: pd.DataFrame, train_data: pd.DataFrame,
test_original: pd.DataFrame) -> pd.DataFrame:
"""This function predics the class of a test dataset with
missing target
Args:
some_df (pd.DataFrame): _description_
train_data (pd.DataFrame): _description_
test_original (pd.DataFrame): _description_
Returns:
pd.DataFrame: _description_
"""
# Align "train" and "test" datasets
test_original[TARGET_VARIABLE] = 0
test_original = test_original[train_original.columns]
# making predictions
test_original = test_original.iloc[:, :-1]
test_original[TARGET_VARIABLE] = predictor_model(train_original,
test_dataset)
test_prediction = test_original[test_original.columns.intersection(
train_data.columns)]
#save your predictions to a csv file
df2csv(test_prediction, WORKING_DIR / Path(TEST_PREDICTION), True, False)
def featuretarget_separator(some_df: pd.DataFrame) -> tuple:
"""This function separates a dataframe into its features and target for later ML modelling
Args:
train_original (pd.DataFrame): _description_
Returns:
tuple: _description_
"""
return (some_df.values[:, :len(some_df.columns) - 1],
some_df.values[:, len(some_df.columns) - 1])
def pipeline_maker() -> Pipeline:
"""This function returns a pipeline for the best-fit model
Returns:
object: _description_
"""
steps = [('scaler', StandardScaler(with_mean=False)),
('rfc',
RandomForestClassifier(random_state=SEED,
n_estimators=105,
max_features=19,
max_depth=2))]
return Pipeline(steps)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
