'Python Kmeans visualization (High Dimensions)

enter image description hereI have to cluster my customers whose have more than 15 dimensions with Python. Can you advice me, is it correct to visualize clusters after Kmeans with T-SNE method? I received very good plot even with outliers in my data-frame. This makes me doubt if I'm doing everything right. My colleagues who make clustering with High Dimensions on R don't use any method of dimensionality reduction like PCA or T-SNE and noticed that probably I'm not correct. It's my first experience in it. Thanks in advance for your help.

I wrote my code from example : https://www.kaggle.com/minc33/visualizing-high-dimensional-clusters

My code:

#import libraries
import numpy as np
import pandas as pd
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import pandas.io.sql as psql
import plotly.graph_objs as go
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE 
from sklearn.cluster import KMeans
from scipy import stats
import matplotlib.pyplot as plt
#download my dataframe
data = pd.read_csv ('C:\\Users\\Desktop\python\ex.csv')
#use data without customer id
d=data.iloc[:,1:20]
#Kmeans
X =d.copy()
scaler = MinMaxScaler()
numer = pd.DataFrame(scaler.fit_transform(X))
kmeans = KMeans(n_clusters=3)
kmeans.fit(numer)
clusters = kmeans.predict(numer)
numer["Cluster"] = clusters
#visualisation
plotX = pd.DataFrame(np.array(numer.sample(10000)))
plotX.columns =numer.columns
perplexity = 50
tsne_2d = TSNE(n_components=2, perplexity=perplexity)
TCs_2d = pd.DataFrame(tsne_2d.fit_transform(plotX.drop(["Cluster"], axis=1)))
TCs_2d.columns = ["TC1_2d","TC2_2d"]
plotX = pd.concat([plotX,TCs_2d], axis=1, join='inner')
cluster0 = plotX[plotX["Cluster"] == 0]
cluster1 = plotX[plotX["Cluster"] == 1]
cluster2 = plotX[plotX["Cluster"] == 2]
trace1 = go.Scatter(
                    x = cluster0["TC1_2d"],
                    y = cluster0["TC2_2d"],
                    mode = "markers",
                    name = "Cluster 0",
                    marker = dict(color = 'rgba(255, 128, 255, 0.8)'),
                    text = None)
#trace2 is for 'Cluster 1'
trace2 = go.Scatter(
                    x = cluster1["TC1_2d"],
                    y = cluster1["TC2_2d"],
                    mode = "markers",
                    name = "Cluster 1",
                    marker = dict(color = 'rgba(255, 128, 2, 0.8)'),
                    text = None)
#trace3 is for 'Cluster 2'
trace3 = go.Scatter(
                    x = cluster2["TC1_2d"],
                    y = cluster2["TC2_2d"],
                    mode = "markers",
                    name = "Cluster 2",
                    marker = dict(color = 'rgba(0, 255, 200, 0.8)'),
                    text = None)
data = [trace1, trace2, trace3]
title = "Visualizing Clusters in Two Dimensions Using T-SNE (perplexity=" + str(perplexity) + ")"
layout = dict(title = title,
              xaxis= dict(title= 'TC1',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'TC2',ticklen= 5,zeroline= False)
             )
fig = dict(data = data, layout = layout)
plot(fig)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source