'Is there any clustering method that prevents reordering?
For example, I have a text that consists of lines. Each line has own length, indent and other features. My goal is to find poems in this text, but all of clustering methods that I know reorder lines and build clusters independently of the position in text. I've tried to use the position as one of features, but I don't like the result. It will be cool if you hint me something like DBSCAN. Can you help me?
Solution 1:[1]
Supposedly clustering is not the right tool for your problem. There could be some segmentation algorithm that can be adopted to your problem.
Butt better consider it as an optimization problem, and solve it as such instead of hoping some clustering algorithm happens to work.
Solution 2:[2]
I think this question boils down to which features to use. You have a natural language processing task, so I would suggest Word2Vec, e.g.
- https://www.tensorflow.org/versions/r0.9/tutorials/word2vec/index.html
- https://radimrehurek.com/gensim/models/word2vec.html
This approach is able to embedd words, sentences and even documents in a vector space.
See also: Document classification with distributions of word vectors
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Has QUIT--Anony-Mousse |
| Solution 2 | Glorfindel |
