'How to use fuzzy and match phrase?

I'm using elasticsearch version 7.6.2

I want to search for a sentence and get results of the same words order (like match_phrase) with sentence fuzziness

Example:

PUT demo_idx/_doc/1
{
  "content": "michael jordan and scottie pippen"
}

I want to search the following sentences (with fuzziness equals 2):

  • "michael jordan and scottie pippen" -> get results (reason: same sentence)
  • "scottie pippen and michael jordan" -> 0 results (reason: words not in the correct order)
  • "ichael jordan and scottie pippen" -> get results (reason: 'm' of michael is missing, 1 fuzziness)
  • "ichae jordan and scottie pippen" -> get results (reason: 'm' + 'l' of michael are missing, 2 fuzziness)
  • "ichael jordan and cottie pippen" -> get results (reason: 'm' of michael and 's' of scottie are missing, 2 fuzziness)
  • "ichael jordan and cottie pippe" -> 0 results (reason: 'm' of michael and 's' of scottie and 'n' of pippen are missing, 3 fuzziness)
  • "ichael jordan and ottie pippen" -> 0 results (reason: 'm' of michael and 's' + 'c' of scottie are missing, 3 fuzziness)

I read and tried the solution from this post: Elasticsearch Fuzzy Phrases but I didn't get the required results.

I have tried:

"query": {
            "span_near": {
                "clauses": [
                    {"span_multi":
                     {
                         "match": {
                             "fuzzy": {
                                "content": {
                                    "value": query,
                                    "fuzziness": 2
                                }
                            }
                            }
                     }
                    }
                ],
            }
        }

but it didn't worked.

How can I right the search query in order to get the results I want ?



Solution 1:[1]

If you want to have exact matches with fuzziness you can go with keyword tokenizer when defining the index

PUT test_index
{
  "mappings": {
    "properties": {
      "content": {
      "type":"text",
      "analyzer": "custom_english"
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_english": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase"
          ],
          "type":"custom"
        }
      }
    }
  }
}

Then you can use the fuzzy query to get your search results

GET test_index/_search
{
  "query": {
    "fuzzy": {
      "content": {
        "value": "ichael jordan and ottie pippen"
      }
    }
  }
}

This is working for all the test cases you mentioned in your question.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Barkha Jain