'Error running HDBSCAN from Scikit-learn example

I got the following error when trying to run the following example combined with this one from Scikit-learn in my own dataset (~half a million samples):

    plt.scatter(data_.T[0][c_exemplars], data_.T[1][c_exemplars], c=palette[i], **plot_kwds)
IndexError: list index out of range

The adapted code is the following:

reducer = umap.UMAP()

data_ = reducer.fit_transform(data)

sns.set_context('poster')
sns.set_style('white')
sns.set_color_codes()

plot_kwds={'alpha':0.25, 's':60, 'linewidths':0}
palette = sns.color_palette('deep', 12)

clusterer = hdbscan.HDBSCAN(min_cluster_size=15, metric='manhattan').fit(data_)

tree = clusterer.condensed_tree_
plt.scatter(data_.T[0], data_.T[1], c='grey', **plot_kwds)
for i, c in enumerate(tree._select_clusters()):
            c_exemplars = self.exemplars(c, tree)
            plt.scatter(data_.T[0][c_exemplars], data_.T[1][c_exemplars], c=palette[i], **plot_kwds)

plt.plot()  

The self.exemplars() function is exactly the same implemented in the example. Apparently, I need more colors once the number of clusters is around ~8k. How I could manage it?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source