'KNN: How can I reverse unseen encoded labels?
I try to make a prediction with KNN, but since the data is float I need to encode it so that scikitlearn accepts it. This is my approach, which works fine. I can train and predict. But the output is obviously encoded:
df = pd.read_csv('data.csv', index_col = 'date', parse_dates = True)
X = df.drop(["predictor_pct_chg"], axis=1).values
y = df["predictor_pct_chg"].values
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
shuffle=False,
)
lab_enc = preprocessing.LabelEncoder()
training_scores_encoded = lab_enc.fit_transform(y_train)
print(training_scores_encoded)
print(utils.multiclass.type_of_target(y_train))
print(utils.multiclass.type_of_target(y_train.astype('int')))
print(utils.multiclass.type_of_target(training_scores_encoded))
knn = KNeighborsClassifier()
knn.fit(
X_train,
training_scores_encoded,
)
y_pred = knn.predict(X_test)
Training and making a prediction works fine, but now I want to plot the prediction and compare it to my y_test:
y_pred = lab_enc.inverse_transform(y_pred)
plt.plot(y_test, color ='red', label = 'Actual')
plt.plot(y_pred, color ='blue', label = 'Prediction')
plt.xlabel('Time')
plt.ylabel('% Change')
plt.legend()
plt.show
Now the inverse_transform() does not work, because the LabelEncoder has never seen the prediction before. So how can I reverse it then? I mean I could use the LabelEncoder on the y_test as well and then compare that to the y_pred. But this doesnt make sense, since I need a useful prediction in the actual unit (here: %). Otherwise I cannot interpret the predictions.
Error:
ValueError: y contains previously unseen labels:
Solution 1:[1]
My guess on the origin of this ValueError:
since you haven't stratified the data by y in train_test_split, y_test could contain labels which were not present in training data. So, try setting train_test_split parameter stratify = y.
For detailed explanation see the Stratification section of sklearn User Guide
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Konstantin Z |
