'Elasticsearch search document with nested document with optional fields

I'm trying to create query for nested object that contains year and month. Both of them are optional. If some field not exists we treat them as hit. I found one solution but it causes combinatorial explosion of terms so I'm trying to find a better solution.

Steps of reproduction:

  1. Creating index with mapping
PUT /date-test
{
    "mappings": {
        "properties": {
            "datesOfBirth": {
                "type": "nested"
            }
        }
    }
}
  1. Add documents with nested objects
PUT /date-test/_doc/1
{
    "name": "Object1",
    "datesOfBirth": []
}
PUT /date-test/_doc/2
{
    "name": "Object2",
    "datesOfBirth": [
        {
            "year": 1990,
            "month": 4
        }
    ]
}
PUT /date-test/_doc/3
{
    "name": "Object3",
    "datesOfBirth": [
        {
            "year": 1995,
            "month": 2
        },
        {
            "year": 1998,
            "month": 4
        }
    ]
}
PUT /date-test/_doc/4
{
    "name": "Object4",
    "datesOfBirth": [
        {
            "month": 4
        }
    ]
}
  1. This query works as expected for year range 1994-1996 and month range 1-5 (objects 1, 3, 4 are returned):
POST /date-test/_search
{
    "size": 1000,
    "query": {
        "bool" : {
            "should": [
                { "bool": {"must_not": [ //match when all fields are absent
                        { "nested": { "path": "datesOfBirth", "query": { "exists": { "field": "datesOfBirth.year" }} }},
                        { "nested": { "path": "datesOfBirth", "query": { "exists": { "field": "datesOfBirth.month" }} }}
                    ]
                }},
                { "bool": {"must_not": [ //match when year is absent but month exists and match to range
                        { "nested": { "path": "datesOfBirth", "query": { "exists": { "field": "datesOfBirth.year" }} }}
                    ],
                    "should": [
                        {"nested": { "path": "datesOfBirth", "query": { "bool": { "must": [
                            { "range": { "datesOfBirth.month": { "gte": 1, "lte": 5} } }
                        ]
                        }}}}
                    ]
                }},
                { "bool": {"must_not": [ //match when month is absent but year exists and match to range
                        { "nested": { "path": "datesOfBirth", "query": { "exists": { "field": "datesOfBirth.month" }} }}
                    ],
                    "should": [
                        {"nested": { "path": "datesOfBirth", "query": { "bool": { "must": [
                            { "range": { "datesOfBirth.year": { "gte": 1994, "lte": 1996} } }
                        ]
                        }}}}
                    ]
                }},
                {"nested": { "path": "datesOfBirth", "query": { "bool": { "must": [ //both fields exists and must match to given ranges
                    { "range": { "datesOfBirth.year": { "gte": 1994, "lte": 1996} } },
                    { "range": { "datesOfBirth.month": { "gte": 1, "lte": 5} } }
                ]
                }}}}
            ],
            "minimum_should_match": 1
        }
    }
}

Is there better way to achieve that behaviour? I'm using Elasticsearch 7.1.



Solution 1:[1]

I've also tried always set field but with null in case of value absence and add mapping for year and month where I define null_value: -1. Then I can remove part with combination of field absence.

  1. Create index with mapping
PUT /date-test
{
    "mappings": {
        "properties": {
            "datesOfBirth": {
                "type": "nested",
                "properties": {
                    "year": { "type": "integer", "null_value": -1 },
                    "month": { "type": "integer", "null_value": -1 }
                }
            }
        }
    }
}
  1. Creating documents as follows:
PUT /date-test/_doc/7
{
    "name": "SomeObjectWithoutYear",
    "datesOfBirth": [
        {
            "year": null,
            "month": 4
        }
    ]
}

Then I can do query like this:

POST /date-test/_search
{
    "size": 1000,
    "query": {
        "bool" : {
            "should": [
                { "bool": {"must_not": [
                        { "nested": { "path": "datesOfBirth", "query": { "exists": { "field": "datesOfBirth.year" }} }},
                        { "nested": { "path": "datesOfBirth", "query": { "exists": { "field": "datesOfBirth.month" }} }}
                    ]
                }},
                {"nested": { "path": "datesOfBirth", "query": { "bool": { "should": [
                    { "match": { "datesOfBirth.year": { "query": -1 } } },
                    { "match": { "datesOfBirth.month": { "query": -1 } } },
                    { "range": { "datesOfBirth.year": { "gte": 1994, "lte": 1996} } },
                    { "range": { "datesOfBirth.month": { "gte": 1, "lte": 5} } }
                ],
                "minimum_should_match": 2
                }}}}
            ],
            "minimum_should_match": 1
        }
    }
}

But I'm wondering if it is the cleanest way to achieve that.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 kemot90