'Error running HDBSCAN from Scikit-learn example
I got the following error when trying to run the following example combined with this one from Scikit-learn in my own dataset (~half a million samples):
plt.scatter(data_.T[0][c_exemplars], data_.T[1][c_exemplars], c=palette[i], **plot_kwds)
IndexError: list index out of range
The adapted code is the following:
reducer = umap.UMAP()
data_ = reducer.fit_transform(data)
sns.set_context('poster')
sns.set_style('white')
sns.set_color_codes()
plot_kwds={'alpha':0.25, 's':60, 'linewidths':0}
palette = sns.color_palette('deep', 12)
clusterer = hdbscan.HDBSCAN(min_cluster_size=15, metric='manhattan').fit(data_)
tree = clusterer.condensed_tree_
plt.scatter(data_.T[0], data_.T[1], c='grey', **plot_kwds)
for i, c in enumerate(tree._select_clusters()):
c_exemplars = self.exemplars(c, tree)
plt.scatter(data_.T[0][c_exemplars], data_.T[1][c_exemplars], c=palette[i], **plot_kwds)
plt.plot()
The self.exemplars() function is exactly the same implemented in the example. Apparently, I need more colors once the number of clusters is around ~8k. How I could manage it?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
