'Query Doesn't Match Numbers In Text
Match queries can find strings that contain numbers, in this case, I am trying to search matching phone numbers. Mappings and analyzers are provided below. For example, I have an index as follows
{
"userId": 126817,
"name": "Test User",
"phoneNumber": "5551112233",
}
When I use match query doesn't match anything
{"match" : {"phoneNumber": {"query": "555"}}}
When I use prefix value it does match
{"prefix" : {"phoneNumber": {"value ": "555"}}}
Analyze Results
{
"tokens": [
{
"token": "5551112233",
"start_offset": 0,
"end_offset": 10,
"type": "<NUM>",
"position": 0
}
]
}
Mapping
{
index: "user-clinics",
type: "user-clinic",
body: {properties: {id: {type: "long"}} }
}
Analyzers
const TurkishAnalyzer = {
analysis: {
filter: {
my_ascii_folding: {
type: "asciifolding",
preserve_original: true
}
},
analyzer: {
turkish_analyzer: {
tokenizer: "standard",
filter: ["lowercase", "my_ascii_folding"]
}
}
}
};
const AutoCompleteAnalyzer = {
analysis: {
filter: {
autocomplete_filter: {
type: "edge_ngram",
min_gram: 1,
max_gram: 20
}
},
analyzer: {
autocomplete_search: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase"]
},
autocomplete_index: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "autocomplete_filter"]
}
}
}
};
Solution 1:[1]
It's because edge_ngram tokenizes only from the beginning of the token, hence all prefixes will be indexed, i.e. a, as, asd, asd1, asd12, asd123
You need to change your autocomplete_filter to ngram if you also want to be able to match inside tokens, i.e. d12 or 123.
Beware, though, that this might generate a lot more tokens
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Val |
