'Detecting Patterns in dataset using clustering algorithm
Solution 1:[1]
Sorry, forget SVM, that's for classification. Honestly, I didn't ready your question carefully the first time. I just re-read what you posted originally. Try Mean Shift for automatically detecting the optimal number of clusters. Here's an example. Hopefully you can adapt it for your specific use.
import numpy as np
from sklearn.cluster import MeanShift, estimate_bandwidth
from sklearn.datasets import make_blobs
# #############################################################################
# Generate sample data
centers = [[1, 1], [-1, -1], [1, -1], [1, -1], [1, -1]]
X, _ = make_blobs(n_samples=10000, centers=centers, cluster_std=0.2)
# #############################################################################
# Compute clustering with MeanShift
# The following bandwidth can be automatically detected using
bandwidth = estimate_bandwidth(X, quantile=0.6, n_samples=5000)
ms = MeanShift(bandwidth=bandwidth, bin_seeding=True)
ms.fit(X)
labels = ms.labels_
cluster_centers = ms.cluster_centers_
labels_unique = np.unique(labels)
n_clusters_ = len(labels_unique)
print("number of estimated clusters : %d" % n_clusters_)
# #############################################################################
# Plot result
import matplotlib.pyplot as plt
from itertools import cycle
plt.figure(1)
plt.clf()
colors = cycle('bgrcmykbgrcmykbgrcmykbgrcmyk')
for k, col in zip(range(n_clusters_), colors):
my_members = labels == k
cluster_center = cluster_centers[k]
plt.plot(X[my_members, 0], X[my_members, 1], col + '.')
plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
markeredgecolor='k', markersize=14)
plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()
Or, try this.
import numpy as np
import pandas as pd
from sklearn.cluster import MeanShift
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.datasets import make_blobs
# We will be using the make_blobs method
# in order to generate our own data.
clusters = [[2, 2, 2], [7, 7, 7], [5, 13, 13]]
X, _ = make_blobs(n_samples = 150, centers = clusters,
cluster_std = 0.60)
# After training the model, We store the
# coordinates for the cluster centers
ms = MeanShift()
ms.fit(X)
cluster_centers = ms.cluster_centers_
# Finally We plot the data points
# and centroids in a 3D graph.
fig = plt.figure()
ax = fig.add_subplot(111, projection ='3d')
ax.scatter(X[:, 0], X[:, 1], X[:, 2], marker ='o')
ax.scatter(cluster_centers[:, 0], cluster_centers[:, 1],
cluster_centers[:, 2], marker ='x', color ='red',
s = 300, linewidth = 5, zorder = 10)
plt.show()
There are a few clustering methodologies that help you choose the optimal number of clusters automatically. Check out the link below for some ideas of how to move forward with your project.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |



