'Pagination in DynamoDB using Node.js?

I've had a read through AWS's docs around pagination:

As their docs specify:

In a response, DynamoDB returns all the matching results within the scope of the Limit value. For example, if you issue a Query or a Scan request with a Limit value of 6 and without a filter expression, DynamoDB returns the first six items in the table that match the specified key conditions in the request (or just the first six items in the case of a Scan with no filter)

Which means that given I have a table called Questions with an attribute called difficulty(that can take any numeric value ranging from 0 to 2) I might end up with the following conundrum:

  • A client makes a request, think GET /questions?difficulty=0&limit=3
  • I forward that 3 to the DynamoDB query, which might return 0 items as the first 3 in the collection might not be of difficulty == 0
  • I then have to perform a new query to fetch more questions that match that criteria without knowing I might return duplicates

How can I then paginate based on a query correctly? Something where I'll get as many results as I asked for whilst having the correct offset



Solution 1:[1]

Using async/await.

const getAllData = async (params) => { 

    console.log("Querying Table");
    let data = await docClient.query(params).promise();

    if(data['Items'].length > 0) {
        allData = [...allData, ...data['Items']];
    }

    if (data.LastEvaluatedKey) {
        params.ExclusiveStartKey = data.LastEvaluatedKey;
        return await getAllData(params);

    } else {
        return data;
    }
}

I am using a global variable allData to collect all the data.

Calling this function is enclosed within a try-catch

try {

        await getAllData(params);
        console.log("Processing Completed");

        // console.log(allData);

    } catch(error) {
        console.log(error);
    }

I am using this from within a Lambda and it works fine.

The article here really helped and guided me. Thanks.

Solution 2:[2]

Here is an example of how to iterate over a paginated result set from a DynamoDB scan (can be easily adapted for query as well) in Node.js.

You could save the LastEvaluatedKey state serverside and pass an identifier back to your client, which it would send with its next request and your server would pass that value as ExclusiveStartKey in the next request to DynamoDB.

const AWS = require('aws-sdk');
AWS.config.logger = console;

const dynamodb = new AWS.DynamoDB({ apiVersion: '2012-08-10' });

let val = 'some value';

let params = {
  TableName: "MyTable",
  ExpressionAttributeValues: {
    ':val': {
      S: val,
    },
  },
  Limit: 1000,
  FilterExpression: 'MyAttribute = :val',
  // ExclusiveStartKey: thisUsersScans[someRequestParamScanID]
};

dynamodb.scan(scanParams, function scanUntilDone(err, data) {
  if (err) {
    console.log(err, err.stack);
  } else {
    // do something with data

    if (data.LastEvaluatedKey) {
      params.ExclusiveStartKey = data.LastEvaluatedKey;

      dynamodb.scan(params, scanUntilDone);
    } else {
      // all results scanned. done!
      someCallback();
    }
  }
});

Solution 3:[3]

Avoid using recursion to prevent call stack overflow. An iterative solution extending @Roshan Khandelwal's approach:

const getAllData = async (params) => {
  const _getAllData = async (params, startKey) => {
    if (startKey) {
      params.ExclusiveStartKey = startKey
    }
    return this.documentClient.query(params).promise()
  }
  let lastEvaluatedKey = null
  let rows = []
  do {
    const result = await _getAllData(params, lastEvaluatedKey)
    rows = rows.concat(result.Items)
    lastEvaluatedKey = result.LastEvaluatedKey
  } while (lastEvaluatedKey)
  return rows
}

Solution 4:[4]

I hope you figured out. So just in case others might find it useful. AWS has QueryPaginator/ScanPaginator as simple as below:

const paginator = new QueryPaginator(dynamoDb, queryInput);

for await (const page of paginator) {
    // do something with the first page of results
    break
}

See more details at https://github.com/awslabs/dynamodb-data-mapper-js/tree/master/packages/dynamodb-query-iterator

2022-05-19: For AWS SDK v3 see how to use paginateXXXX at this blog post https://aws.amazon.com/blogs/developer/pagination-using-async-iterators-in-modular-aws-sdk-for-javascript/

Solution 5:[5]

Query and Scan operations return LastEvaluatedKey in their responses. Absent concurrent insertions, you will not miss items nor will you encounter items multiple times, as long as you iterate calls to Query/Scan and set ExclusiveStartKey to the LastEvaluatedKey of the previous call.

Solution 6:[6]

Using async/await, returning the data in await. Elaboration on @Roshan Khandelwal's answer.

const getAllData = async (params, allData = []) => {
  const data = await dynamodbDocClient.scan(params).promise()

  if (data['Items'].length > 0) {
    allData = [...allData, ...data['Items']]
  }

  if (data.LastEvaluatedKey) {
    params.ExclusiveStartKey = data.LastEvaluatedKey
    return await getAllData(params, allData)
  } else {
    return allData
  }
}

Call inside a try/catch:

try {
        const data = await getAllData(params);
        console.log("my data: ", data);
    } catch(error) {
        console.log(error);
    }

Solution 7:[7]

For create pagination in dynamodb scan like

var params = {  
        "TableName"                 : "abcd",
        "FilterExpression"          : "#someexperssion=:someexperssion",
        "ExpressionAttributeNames"  : {"#someexperssion":"someexperssion"},
        "ExpressionAttributeValues" : {":someexperssion" : "value"},
        "Limit"                     : 20,
        "ExclusiveStartKey"         : {"id": "9ee10f6e-ce6d-4820-9fcd-cabb0d93e8da"}
    };
DB.scan(params).promise();

where ExclusiveStartKey is LastEvaluatedKey return by this query last execution time

Solution 8:[8]

you can do a index secundary by difficulty and at query set KeyConditionExpression where difficulty = 0. Like this

var params = {
    TableName: questions,
    IndexName: 'difficulty-index',
    KeyConditionExpression: 'difficulty = :difficulty ',
    ExpressionAttributeValues: {':difficulty':0}
}

Solution 9:[9]

You can also achieve this using recrusion instead of a global variable, like:

const getAllData = async (params, allData = []) => {
    let data = await db.scan(params).promise();
    return (data.LastEvaluatedKey) ?
      getAllData({...params, ExclusiveStartKey: data.LastEvaluatedKey}, [...allData, ...data['Items']]) :
      [...allData, ...data['Items']];
};

Then you can simply call it like:

 let test = await getAllData({ "TableName": "test-table"}); // feel free to add try/catch

Solution 10:[10]

  • This can happen from time to time if a problem order gets stuck in your sync queue, thus stopping your CronJob.
  • To resolve this, go into the backend manually, and delete the problem orders, which should free up the queue to process pending Cron jobs:
  • Make a backup of your database
  • Delete items from the database table op_avatax_queue.
  • Make sure you delete only those Pending transactions that are worth deleting.