'Missing text from Elasticsearch highlighted text when field contains exclamation mark
When searching for a text and requesting results query highlight, if the matched document field contains exclamation mark, then the returned highlighted text does not contain part of the text that contains the exclamation mark
Elasticsearch version 7.1.1
document: { "name" : "Yahoo! Inc [Please refer to Altaba Inc and Verizon Communications Inc]"}
searching with highlight for "inc" wildcard
expected: highlighed text should be:
"Yahoo! <em>Inc</em> [Please refer to Altaba <em>Inc</em> and Verizon Communications <em>Inc</em>]"
actual: "Yahoo!" is missing from the response. Got:
"<em>Inc</em> [Please refer to Altaba <em>Inc</em> and Verizon Communications <em>Inc</em>]"
I think this was something to do with the ! mark. If I remove that then everything is OK.
Steps to reproduce:
Add document to a new index
POST test/_doc/ { "name" : "Yahoo! Inc [Please refer to Altaba Inc and Verizon Communications Inc]" }
no other settings / mapping
Run the query
GET test/_search { "query": { "bool": { "should": [ { "wildcard": { "name": { "wildcard": "inc*" } } } ] } }, "highlight": { "fields": { "name" : {} } } }
Got following results:
"hits" : [ { "_index" : "test", "_type" : "_doc", "_id" : "511tP3ABoqekxkoUshVf", "_score" : 1.0, "_source" : { "name" : "Yahoo! Inc [Please refer to Altaba Inc and Verizon Communications Inc]" }, "highlight" : { "name" : [ "<em>Inc</em> [Please refer to Altaba <em>Inc</em> and Verizon Communications <em>Inc</em>]" ] } } ]
expecting highlight:
"Yahoo! <em>Inc</em> [Please refer to Altaba <em>Inc</em> and Verizon Communications <em>Inc</em>]"
Solution 1:[1]
This is expected behavior because, by default, the Elasticsearch highlight returns a part of the searched text (fragments) see: https://www.elastic.co/guide/en/elasticsearch/reference/7.1/search-request-highlighting.html#unified-highlighter
! and . are considered end of previous sentence and the highlight does not return that fragment.
In my case, the searched text was representing a name which had a small text length and by adding "number_of_fragments" : 0
I am forcing the highlight to return the entire document field.
"highlight": {
"fields": {
"name" : {"number_of_fragments" : 0}
}
}
same as: https://github.com/elastic/elasticsearch/issues/52333
Solution 2:[2]
As andreyro says, it is expected behavior for the unified (default) Elasticsearch highlighter. I had this same issue and reducing the number of fragments just made the issue worse. Fortunately, you can change which highlighter is used. I added the following and the issue was fixed.
"highlight": {
"fields": {
"*": {
"type": "plain"
}
}
}
Replace the wildcard "*" as needed for whatever fields you are searching. See the same documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/highlighting.html#set-highlighter-type
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | andreyro |
Solution 2 |