'Multi-Class Document Classification with both known and un-known classes
Currently, I am building a multi-class document classifier which has to classify either 3 known classes, namely "Financial Report", "Insurance_Sheet", "Endorsement", and 1 unknown class which is "Random Doc". The following methods have been trialed, but did not prove a good result as quite a number of random documents have been classified as the known classes: "Financial Report", "Insurance_Sheet", "Endorsement".
- Method 1: TD-IDF + Linear SVC
- Method 2: Word2Vec for word embedding, then average those word-embedding to get the embedding vector for each document then feed to a classification model.
- Method 3: Doc2Vec to get the embedding vector for each document and then feed to a classification model.
Can you help suggest a good approach for this case ? Thanks a lot.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
