'How to identify/detect a vocabulary in a text (Node JS)
I'm currently working on an app on which I have blocs of text and would like to know if they're related to cooking / recipe vocabulary. I've seen and tried a few things, but I'm starting to wonder if I'm not going to much overkill on that ( I don't want to recreate the wheel ).
The road on which I'm working now implies to get all words related to this vocabulary ( ingredients, actions, objects.. in many languages) and compare my database to each word on my texts blocs and then define a score for each bloc that would be used to decide (depending on my threshold) if should keep it or not.
The main problem with this method is that I need to create a very big database myself (which is a long ass process) and the bigger my database gets, the longer/less effective the comparing process might be. Any ideas of howto do that ? Thank you !
Solution 1:[1]
Consider using a text classifier and train it using relevant examples. A simple starting point is to use a naive bayes text classifier — it is fast and generates models of reasonable size.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | sks |
