'Elasticsearch - grouping by multiple fields

I thought it would be a simple operation to Elasticsearch queries, but grouping multiple fields doesn't like to be that trivial.

I looking for a way to query the latest (based on savedAt field) data for the combination of type, size and category fields.

Data:

POST data/_create/1
{
   "content": "some text ...",
   "type": "regular",
   "size": "medium",
   "category" : "a1",
   "savedAt": "2022-01-02 15:09:27.527+0200"
}

POST data/_create/2
{
   "content": "some other text ...",
   "type": "regular",
   "size": "small",
   "category" : "a1",
   "savedAt": "2022-01-02 16:09:27.527+0200"
}

POST data/_create/3
{
   "content": "some other text ...",
   "type": "regular",
   "size": "big",
   "category" : "a1",
   "savedAt": "2022-01-02 19:09:27.527+0200"
}

POST data/_create/4
{
   "content": "some other different text ...",
   "type": "regular",
   "size": "big",
   "category" : "a1",
   "savedAt": "2022-01-02 20:09:27.527+0200"
}

I expect to get as response data with indexes 1, 2 and 4 for the combinations:

  • regular - medium - a1
  • regular - small - a1
  • regular - big - a1

I can't use collapse, it doesn't support multiple fields.

I tried to use aggregations:

GET data/_search
{
  "size": 0,
  "aggs": {
    "agg1": {
      "terms": {
        "field": "type.keyword"
      },
      "aggs": {
        "agg2": {
          "terms": {
            "field": "size.keyword"
          },
          "aggs": {
            "agg3": {
              "terms": {
                "field": "category.keyword"
              }
            }
          }
        }
      }
    }
  }
}

But, it doesn't return anything:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "my-agg-name" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ ]
    }
  }
}

Any suggestion?

UPDATE

This is the mapping being used for this data

PUT data
{
  "mappings": {
    "properties": {
      "savedAt": {
        "type":   "date",
        "format": "yyyy-MM-dd HH:mm:ss.SSSZ"
      },
      "type": {
        "type":   "text",
        "analyzer": "keyword"
      },
       "size": {
        "type":   "text",
        "analyzer": "keyword"
      },
       "category": {
        "type":   "text",
        "analyzer": "keyword"
      }
    }
  }
}


Solution 1:[1]

you should index your doc by replacing _create with _doc. After this change your query will work.

POST data/_doc/1
{
   "content": "some text ...",
   "type": "regular",
   "size": "medium",
   "category" : "a1",
   "savedAt": "2022-01-02 15:09:27.527+0200"
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 andrecoelho.rabbitbr