'How to return latest distinct rows in elasticsearch ignoring a timestamp field

I have documents like:

{'foo': 'foo', 'bar': 'bar', ..., 'timestamp': 3}
{'foo': 'diffval', 'bar': 'diffval', ..., 'timestamp': 2}
{'foo': 'foo', 'bar': 'bar', ..., 'timestamp': 2}
{'foo': 'foo', 'bar': 'bar', ..., 'timestamp': 1}

Which I search for via _search?from=0&size=20&sort=timestamp%3Adesc

I would like to now search for just the latest distinct row - e.g:

{'foo': 'foo', 'bar': 'bar', ..., 'timestamp': 3}
{'foo': 'diffval', 'bar': 'diffval', ..., 'timestamp': 2}

But I would like to do this without explicitly indicating the foo, bar, fields as there could be a lot and are not consistently there - the timestamp field however is consistent.



Solution 1:[1]

I have found a sollution where I create a hash field of all the fields apart from the timestamp before storing in the document. Then I use the collapse functionality in opensearch - the hits will then return the latest distinct hash.

GET /.../_search?sort=timestamp%3Adesc
{
  "collapse": {
    "field": "hash",
    "inner_hits": {
      "name": "",
      "size": 0,
      "sort": [
        {
          "timestamp": {
            "order": "desc"
          }
        }
      ]
    }
  }
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1