'Elasticsearch is returning less relevant results based on insignificant words in the query
I'm building a small web search engine using Elasticsearch. I'm using the following query;
{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "how to format code better golang",
"boost": 3,
"fuzziness": "AUTO"
}
}
},
{
"match": {
"keywords": {
"query": "how to format code better golang",
"boost": 2,
"fuzziness": "AUTO"
}
}
},
{
"match": {
"description": {
"query": "how to format code better golang",
"boost": 1,
"fuzziness": "AUTO"
}
}
}
]
}
}
}
When I run it, these are the first 2 results (they're edited after querying, but the score/position hasn't been tampered with):
{
"id": "7a8a9b4b96c05460f32d18bba0804fdf",
"score": 4651,
"meta": {
"url": "https://www.youtube.com/watch?v=IhC7sdYe-Jg",
"title": "How a Compiler Works in ~1 minute - YouTube",
"description": "A quick video explaining what a compiler does and how it works. The simple compiler I wrote is available in GitHub: http://www.github.com/charles-l/koona.Red...",
"keywords": "tutorial, Compiler (Software Genre), compiler, computer, code, language, programming language, clang, gcc, lexer, parser, generator, ruby, how to write a compiler"
}
},
{
"id": "59c42e9f27efc9eea64b25d31d8146d1",
"score": 4224,
"meta": {
"url": "https://dev.to/ksingh7/golang-automatic-code-formatting-code-like-a-pro-205a",
"title": "Golang automatic code formatting : Code like a Pro - DEV Community",
"description": "Why Format your code? Everyone loves clean readable and beautifully organized code using... Tagged with go, formatting, vscode.",
"keywords": "go, formatting, vscode, software, coding, development, engineering, inclusive, community"
}
}
Of course, I expect the second result to be more relevant than the first one. But it isn't. I tried a few different queries, but on almost every query I tried, the result that I want to be on the top, is always second or more. Sometimes it did end up on top, but then if I added the word "in" to the query (e.g. "how to format code better in golang"), it would become second again.
Is there any way I can make results more relevant?
Solution 1:[1]
As you have not shared your index mapping and settings, and mostly using the default analyzer(standard) which doesn't remove the english stop worlds like this, is, how etc ie insignificant terms in your case. To fix the issue you need to use the english analyzer which would remove these terms both at index and query time, and give second document much better score.
Ex:-
POST /my-index
{
"mappings" :{
"properties" : {
"title" : {
"type": "text",
"analyzer" : "english"
},
"description" : {
"type": "text",
"analyzer" : "english"
},
"keywords" : {
"type": "text",
"analyzer" : "english" // note english analyzer on all the fields
}
}
}
}
Index your both sample docs.
Same search produced below result for me.
"hits": [
{
"_index": "71413449",
"_type": "_doc",
"_id": "2",
"_score": 10.55508,
"_source": {
"title": "Golang automatic code formatting : Code like a Pro - DEV Community",
"description": "Why Format your code? Everyone loves clean readable and beautifully organized code using... Tagged with go, formatting, vscode.",
"keywords": "go, formatting, vscode, software, coding, development, engineering, inclusive, community"
}
},
{
"_index": "71413449",
"_type": "_doc",
"_id": "1",
"_score": 5.1878767,
"_source": {
"title": "How a Compiler Works in ~1 minute - YouTube",
"description": "A quick video explaining what a compiler does and how it works. The simple compiler I wrote is available in GitHub: http://www.github.com/charles-l/koona.Red...",
"keywords": "tutorial, Compiler (Software Genre), compiler, computer, code, language, programming language, clang, gcc, lexer, parser, generator, ruby, how to write a compiler"
}
}
]
You can notice now your second document score is almost double than your first document.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Amit |
