'Match query fuzzily to an array of candidates
I have an index in elastic with the following document structure:
{
"questions": [
"What is your name?",
"How are you called?",
"What should I call you?",
...
],
"answer": "<answer>"
}
I would like to match queries to one of the entries in the questions array.
For example the query "What's your name"?
The returning document should be the one with the closest matching entry of questions in all the documents in the index.
I have tried:
{
"query": {
"match": { "questions": { "query": "<question>", "fuzziness": "auto" } },
}
}
But that sometimes returns a "wrong" document, even if the query is one of the entries of questions in one of the documents exactly.
I've also tried
{
"query": {
"match_phrase": { "questions": "<query>" },
}
}
But that doesn't allow fuzziness, and since the queries are human inputs, it's not catching enough cases
And lastly I tried
{
"query": {
"span_near": [
{ "span_multi": {
"match": {
"fuzzy: {
"questions": { "fuzziness": "auto", "value": "<first word of the query>" },
}
}
},
{ "span_multi": {
"match": {
"fuzzy: {
"questions": { "fuzziness": "auto", "value": "<second word of the query>" },
}
}
},
...
]
}
}
But that (at least as far as I seem to notice) only matches questions exactly with fuzzy words.
What I would like (at least as far as I understand), is a fuzzy TF-IDF across all entries of questions, get the best match and then rank the documents according to the best matches of one of the entries of questions (not the entirety of the questions array)
I'm a pretty inexperienced novice when it comes to Elastic, so I appreciate any tips and tricks or outright solutions you might have for me, thank you!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
