'How to implement Elasticsearch pagination with large dataset

Environment

.Net 5

Elasticsearch.Net.Aws 7.1.0

Problem

Even with pagination, Elasticsearch's query API does not support more than 10_000 records by default. I.e. if the sum of from and size > 10_000 the API throws an error.

Potential solutions

Increase size


I can increase the index's max_result_window as described here. However I am expecting a large dataset in production - probably less than 10_000_000 records at one time, but for obvious reasons I don't believe that simply increasing the window size is a good idea. My use-case does not require over-the-top performance, but it has to be reasonable for both the end-user and the AWS bill.

What do you think? What leeway do I have regarding to max_result_window setting?

Track total hits


I've read about track_total_hits parameter - It only returns the correct amount of total hits on each request, but still does not allow records after the 10_000th to be fetched

Scroll API


I've read about the Scroll-API - it's being deprecated currently, so I'd like to avoid it.

Search after


I've read about the search_after parameter - the concept is to define a consistent sort criteria and call exact query for each page, the only difference being is the value of search_after, which for every subsequent search should be the sort value returned of the last hit in the previous search.

As far as I can tell this is the recommended solution, but while it may work for large page sizes, I'm having difficulty understanding how it solves the basic paging case:

Lets say we have 20_000 records total, page size is 10, hense 2_000 pages. How can I return the last page, containing records 19_990-20_000? Unless I misunderstand, search_after does not help, because I've skipped pages and I don't have the sort value of record number 19_989.

Further more, per the docs:

If provided, the from argument must be 0 (default) or -1

This means that I cannot use a combination of both:

  1. Perform one search with "from": "990"
  2. Use the last record's sort value to perform a second search, again using a "from": "990"
  3. Return the results of the second search.

Beyond that I cannot figure out another way to use it. Could you tell me where I'm getting it wrong?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source