'Why confusion matrix shows different results from random undersampling class distribution?
I have an imbalanced dataset that consists of 17 numerical features and 3 classes for output. I applied random undersampling and obtained the following confusion matrix with undersampling.
My question; when random undersampling show 33 numbers for each class why the confusion matrix shows more than 33?
#Raw Data Distribution
layers_counts=y.value_counts()
layers_counts
#Output
2 498
1 116
0 39
from imblearn.under_sampling import RandomUnderSampler
rus=RandomUnderSampler(sampling_strategy="not minority")
X_rus, y_rus = rus.fit_resample(Xtrain, ytrain)
y_rus.value_counts()
#Output
0 33
1 33
2 33
from sklearn.linear_model import LogisticRegression
classifier=LogisticRegression()
classifier.fit(X_rus, y_rus)
ypred=classifier.predict(Xtest)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix (y_test,ypred)
cm_df2 = pd.DataFrame(cm,
index = ['VCS','VSG','VG'],
columns = ['VCS','VSG','VG'])
plt.figure(figsize=(8,6))
sns.heatmap(cm_df2, annot=True)
plt.title('Confusion Matrix')
plt.ylabel('Actal Values')
plt.xlabel('Predicted Values')
plt.show()
When the rus provided 33 numbers for each class the confusion matrix is shown in the following, but I think it should be matched with 33? I am confused about that point, could you help me to understand?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

