'XGBoost giving a static prediction of "0.5" randomly

I am using a scikit-learn pipeline with XGBRegressor. Pipeline is working good without any error. When I am prediction with this pipeline, I am predicting the same data multiple times, Sometimes out of random the predictions are coming as 0.5 while the normal prediction range is (1000-10,000)

eg : (1258.2,1258.2,1258.2,1258.2,1258.2,1258.2,0.5,1258.2,1258.2,1258.2,1258.2)

  • Input Data is exactly same
  • Environment is same

    numeric_transformer = Pipeline(steps=[
            ('imputer', SimpleImputer(strategy='mean')),
            ('scaler', StandardScaler())])
        categorical_transformer = Pipeline(steps=[
            ('imputer',
             SimpleImputer(strategy='constant', fill_value='missing')),
            ('onehot', OneHotEncoder(handle_unknown='ignore'))
        ])
    
    numeric_features = X.select_dtypes(
        include=['int64', 'float64']).columns
    categorical_features = X.select_dtypes(
        include=['object']).columns
    preprocessor = ColumnTransformer(
        transformers=[
            ('num', numeric_transformer, numeric_features),
            ('cat', categorical_transformer, categorical_features)])
    
    # Number of trees
    n_estimators = [int(x) for x in
                    np.linspace(start=50, stop=1000, num=10)]
    # Maximum number of levels in tree
    max_depth = [int(x) for x in np.linspace(1, 32, 32, endpoint=True)]
    # Booster
    booster = ['gbtree', 'gblinear', 'dart']
    # selecting gamma
    gamma = [i / 10.0 for i in range(0, 5)]
    # Learning rate
    learning_rate = np.linspace(0.01, 0.2, 15)
    # Evaluation metric
    #         eval_metric = ['rmse','mae']
    # regularization
    reg_alpha = [1e-5, 1e-2, 0.1, 1, 100]
    reg_lambda = [1e-5, 1e-2, 0.1, 1, 100]
    # Min chile weight
    min_child_weight = list(range(1, 6, 2))
    # Samples
    subsample = [i / 10.0 for i in range(6, 10)]
    colsample_bytree = [i / 10.0 for i in range(6, 10)]
    
    # Create the random grid
    random_grid = {'n_estimators': n_estimators,
                   'max_depth': max_depth,
                   'booster': booster,
                   'gamma': gamma,
                   'learning_rate': learning_rate,
                   #                        'eval_metric' : eval_metric,
                   'reg_alpha': reg_alpha,
                   'reg_lambda': reg_lambda,
                   'min_child_weight': min_child_weight,
                   'subsample': subsample,
                   'colsample_bytree': colsample_bytree
                   }
    
    # Use the random grid to search for best hyperparameters
    # First create the base model to tune
    rf = xgboost.XGBRegressor(objective='reg:squarederror', n_jobs=4)
    # Random search of parameters, using 3 fold cross validation,
    # search across 100 different combinations, and use all available cores
    rf_random = RandomizedSearchCV(estimator=rf,
                                   param_distributions=random_grid,
                                   n_iter=100,
                                   cv=3,
                                   verbose=0,
                                   random_state=42,
                                   n_jobs=4)
    
    pipe = Pipeline(steps=[('preprocessor', preprocessor),
                           ('regressor', rf_random)])
    
    pipe.fit(X, y)
    

What could be the issue?



Solution 1:[1]

If you're getting some unusually low predictions, it's probably indicating that the dependent variable has outliers. I'd suggest you to read about it, and about the different strategies to tackle this problem, or recommendations.

Usually its not a good idea to consider all data samples for your model without outlier removal. This will lead to much worse and non-representative metrics.

Solution 2:[2]

It is probably because you have Nans or None in your target (y)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Dror Hilman