'Elasticsearch array property must contain given array items

I have documents that look like:

{
    "tags" => [
        "tag1",
        "tag2",
    ],
    "name" => "Example 1"
}

{
    "tags" => [
        "tag1",
        "tag3",
        "tag4"
    ],
    "name" => "Example 2"
}

What I now want is to do a terms search where given array might look like:

[tag1, tag3]

where expected hit should be:

{
    "tags" => [
        "tag1",
        "tag3",
        "tag4"
    ],
    "name" => "Example 2"
}

However, when I do a query like:

GET _search
{
    "query": {
        "filtered": {
           "query": {
               "match_all": {}
           },
           "filter": {
               "bool": {
                   "must": [
                      {
                          "terms": {
                             "tags": [
                                "tag1",
                                "tag3"
                             ]
                          }
                      }
                   ]
               }
           }
       }
    }
}

I get both "Example 1" and "Example 2" as hits since both Example 1 and Example 2 contains either tag1 or tag3. By looking at the documentation for terms I figured out that terms is actually a contains query.

How can I in this case make sure that Example 2 is the only hit when querying with tag1 and tag3?



Solution 1:[1]

You need to set the execution mode to "and" by adding "execution": "and" to the terms filter so that all terms must be contained within a document to be considered a match

GET _search
{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "terms": {
               "tags": [
                  "tag1",
                  "tag3"
               ],
               "execution": "and"
            }
         }
      }
   }
}

This is effectively the same as building a bool must filter with the conjunction of all terms, but in a more compact form.

Solution 2:[2]

For those who are looking at this in 2020, you might have noticed that minimum_should_match is deprecated long back.

There is an alternative currently available, which is to use terms_set.

For eg:

{
  "query": {
    "terms_set": {
      "programming_languages": {
        "terms": [ "c++", "java", "php" ],
        "minimum_should_match_field": "required_matches"
      }
    }
  }
}

The above example assumes a field required_matches exists which contains an integer, that defines how many matches should be there.

What is more useful is the alternative field minimum_should_match_script.

See the example below:

{
  "query": {
    "terms_set": {
      "programming_languages": {
        "terms": [ "c++", "java", "php" ],
        "minimum_should_match_script": {
          "source": "2"
        },
      }
    }
  }
}

You can always use the inside a filter context to make it works a filter.

Read more here

Solution 3:[3]

You can set minimum_should_match to match your array:

{
    "query": {
        "filtered": {
           "query": {
               "match_all": {}
           },
           "filter": {
               "bool": {
                   "must": [
                      {
                          "terms": {
                             "tags": ["tag1","tag3"],
                             "minimum_should_match": 2
                          }
                      }
                   ]
               }
           }
       }
    }
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Russ Cam
Solution 2 Abdul Vajid
Solution 3 chengpohi