'multiclass classification: explore correlation among classes

I am having at hand a classification task for a multiclass imbalanced problem. As an attempt to dig into the dataset, I want to explore correlations among available classes to see how well the classes are separated and potential mixes.

Below, I give an example with pseudo-dataset having 5-classes.


from collections import Counter
from sklearn.datasets import make_classification

X, y = make_classification(1000, n_classes=5, n_informative=10, weights=[.1, .13, .15, .17, .45])

class_suport = Counter(y)

for key, value in sorted(class_suport.items()):
  print(f'Class: {key}, support: {value}')

Class: 0, support: 101
Class: 1, support: 133
Class: 2, support: 148
Class: 3, support: 168
Class: 4, support: 450

So I want to visualise these classes' boundaries do understand their separability but I have no idea how this could be done using matplotlib or seaborn.

I may have to do this with to some features (more relevant features) as well, but having the general idea of class separability visualisation would help get started.



Solution 1:[1]

You can use PCA decomposition to reduce number of dimensions to n=3 and plot as 3D plot.

from sklearn.decomposition import PCA
pca = PCA(n_components=3)
pca.fit(X)
X_out=pca.fit_transform(X)
X_out.shape

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 desertnaut