'Why using the prediction as feature can decrease the performance?

The input is 300-dimensional features (denoted by X), and the label (denoted by y) is the daily return of each stock. The baseline is the prediction (denoted by y_xgb) from xgboost.

I also use a deep learning model to make a prediction (denoted by y_dl) using the same 300-dimensional features. The prediction is somewhat worse than that from xgboost. However, the linear correlation of corr(y_dl, y) is better than every corr(x_i, y), where x_i is the i-th feature from X. Thus, I naturally treat y_dl as a strong feature, and use xgboost to make a prediction on [y_dl, X], aka 301-dimensional features.

Sadly, I find that 1) The accuracy is a bit worse than baseline y_xgb; 2) much faster to get the early stop than baseline, which means it becomes easy to get overfitting.

Now, I am confused what is wrong.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Why using the prediction as feature can decrease the performance?

Sources

Related Questions