'is there a way to cluster tweets after vectorizing them?

I need to cluster tweets based on similarity between them, I am using dec2vec to vectorize them and now I need a way to cluster this vectors, also I tried kmeans and it wasn't a good model for me as I don't know the number of clusters. I tried to use function similarity in gensim library but the result is different each time and wasn't correct! So is there a way to cluster this?



Solution 1:[1]

You need to know how many clusters you want for your particular task, before applying K-means or any other clustering algorithm. And if the number of clusters is very large, then some clustering algorithms like K-means will not be able to scale well. For large number of clusters, you could try some other clustering algorithms like agglomerative clustering or DBSCAN.

If you only need a small number of clusters but don't know the exact number of clusters, you could use T-SNE (T-distributed Stochastic Neighbourhood Embedding) to get an approximate 2-D visualisation of your vectorized tweets, to get an idea of how many clusters you would need.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Aravind G.