'How to return latest distinct rows in elasticsearch ignoring a timestamp field
I have documents like:
{'foo': 'foo', 'bar': 'bar', ..., 'timestamp': 3}
{'foo': 'diffval', 'bar': 'diffval', ..., 'timestamp': 2}
{'foo': 'foo', 'bar': 'bar', ..., 'timestamp': 2}
{'foo': 'foo', 'bar': 'bar', ..., 'timestamp': 1}
Which I search for via _search?from=0&size=20&sort=timestamp%3Adesc
I would like to now search for just the latest distinct row - e.g:
{'foo': 'foo', 'bar': 'bar', ..., 'timestamp': 3}
{'foo': 'diffval', 'bar': 'diffval', ..., 'timestamp': 2}
But I would like to do this without explicitly indicating the foo, bar, fields as there could be a lot and are not consistently there - the timestamp field however is consistent.
Solution 1:[1]
I have found a sollution where I create a hash field of all the fields apart from the timestamp before storing in the document. Then I use the collapse functionality in opensearch - the hits will then return the latest distinct hash.
GET /.../_search?sort=timestamp%3Adesc
{
"collapse": {
"field": "hash",
"inner_hits": {
"name": "",
"size": 0,
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
