'ExclusiveStartKey changes latestEvaluatedKey

I am trying to scan a large table, and I was hoping to do it in chunks by only getting so many, and then saving the lastEvaluatedKey so I could use it as my exslusiveStartKey when I start up the query again.

I have noticed that when I test on smaller tables, I may scan the entire table and get:

Key: A
Key: B
Key: C
Key: D
Key: E

Now, when I select key C as my exslusiveStartKey, I would expect to get back D and E as I run through the rest of the table. However, I will sometimes get different keys. Is this expectation correct?

Something that might be causing problems is that my keys are not alphabetically the same. So some start with a U and some start with an N. If I am using an exclusiveStartKey that starts with a U, am I ignoring any that starts with an N? I know exclusiveStartKey aims for things greater than its value.



Solution 1:[1]

DynamoDB keys have two part - the hash key and the sort key. As the names suggest, while the sort-key part is sorted (for strings, that's an alphabetical order), the hash-key part is not sorted alphabetical. Instead, is sorted by the value hash function, which means their order appears random although consistent: If you scan the same table twice and it didn't change, you should get back the keys in the same seemingly-random order. ExclusiveStartKey can be used to start in the middle of this order, but it shouldn't change the order.

In your example, if a Scan returned A, B, C, D, E in this order (note that as I said, it usually will not be in alphabetical order if you have hash keys!), then if you set ExclusiveStartKey to C you will definitely expect to get D and E for the scan. I don't know how you saw something else - I suspect you did something wrong.

You mentioned the possibility of the table changing in parallel, and whether this has any effect on the result. Well, if according to the hash function a key X comes between C and D, and someone wrote to key X, it is indeed possible that your scan with ExclusiveStartKey=C would find X. However, since in your example we assume that A comes before C, a scan with ExclusiveStartKey=C can never return A - the scan looks for keys whose hash function values are greater than C's - not for newly written data, so A doesn't match.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nadav Har'El