'Why is the max_score higher than the _score-sorted first hit's _score in Elasticsearch?
I have an Elasticsearch (8.1) index on which I run a simple match or multi_match query (I tried both, both show the same behavior, even the simplest ones).
In the result it is always the case that max_score is higher than the first hit's _score.
If I add a terms aggregation (on a keyword field) with a top_hits sub-aggregation (with sorting on _score) then the first hit from the first bucket actually has _score == max_score (but it is obviously also a different hit compared to the "main" hits). So, the top_hits aggregation actually does what I want ("fetch all matching documents and sort by _score"). The "main" hits seem to miss some results, however.
How can I make sure that the "main" hits do not "drop" documents? What is the internal mechanics behind this?
I added my PHP array that gets JSON encoded and produces the Elasticsearch query:
[
'size' => 10,
'query' => [
// the result of this does not have all documents
// that appear in the aggregation
// and the highest ranked doc has lower score than max_score
'bool' => [
'must' => [
[
'match' => [
'my_text_field' => [
'query' => 'searchword'
]
]
],
['term' => ['my_other_field' => ['value' => 3]]],
// plus some more other simple term conditions
// on other simple integral fields, but no scripts ore similar
// simple "WHERE a = 5" conditions
]
]
],
// this aggregation has other/more hits than the directly retrieved docs, matching the max_score
// If I remove the aggregation nothing changes for the actual result
'aggs' => [
'my_agg' => [
'terms' => ['field' => 'my_agg_field', 'order' => ['score' => 'desc']],
'aggs' => [
'score' => ['max' => ['script' => '_score']],
'filteredHits' => [
'top_hits' => [
'size' => 10
]
]
]
]
]
]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
