'Designing tags system with nosql/elastic search
I have to design a system with this schema.
{
"documentId" : 123
"documentType" : "paper"
"tags" :["abc","xyz"]
//other meta data of document
}
The queries I will be doing will be finding k popular tags, get documents by tag,add,remove,update tags and get all tags of a document. What is the optimal strategy to do this considering DB should be highly scalable. I am thinking of three solutions -
- Create a document in NoSql DB like MongoDB and index on tags array. So MongoDB is my primary DB
- Using Elastic search as primary DB and index full document. And then easily search for all queries.
- Using kafka with spark/storm streaming solution
- Designing a slow and fast pipeline in the video - https://www.youtube.com/watch?v=kx-XDoPjoHw&t=1835s (Not sure if spark works in this way only internally)
What is the optimal way to handle such cases?
Solution 1:[1]
It depends;
- Do we need a free text search for tag system ?
- What is the update rate ( Number of docs updated every minute).
IMHO, If answer to Q1 is Yes and update rate is low , use ES
If answer to Q1 is No, and the Update rate is high, you may want to consider a non-Elasticsearch solution.
If the update rate is high and Q1 is Yes, consider a non-Elasticsearch solution ( Depends on size of your index, it is very much possible to use ES , not that it may be optimal)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |