'How can I import data from AWS s3 to dynamoDB?

How can I import data from AWS s3 from the public data set This link, this is a public dataset to dynamoDB?

I have tried many ways to import the data, aws pipeline, aws athena, none of them worked. I also tried using the node.js to import the data, it did not work. I also downloaded the public dataset into my laptop, but I can not find a import button in the dynamoDB website.

Could you guys recommend an efficient and less cost way to import date from the s3 to dynamoDB.

Thanks!



Solution 1:[1]

Write a custom script to download the data and insert it record by record into DynamoDB. You can use the batchWriteItem api to insert multiple records (up to 25) in a single api call. Note this still consumes 1 write capacity unit per record inserted (assuming records are < 1k).

AWS Data Migration service can also do this for you, but your own script will cost nothing (outside of DynamoDB provisioned writes).

Solution 2:[2]

This worked for me pushing 90k records.

    const AWS = require('aws-sdk')
    AWS.config.region = process.env.AWS_REGION 
    const s3 = new AWS.S3()

    const docClient = new AWS.DynamoDB.DocumentClient()
    const ddbTable = "your-table" 

    // The Lambda handler
    exports.handler = async (event) => {
      console.log (JSON.stringify(event, null, 2))
      console.log('Using DDB table: ', ddbTable)

      await Promise.all(
        event.Records.map(async (record) => {
          try {
            console.log('Incoming record: ', record)

            // Get original text from object in incoming event
            const originalText = await s3.getObject({
              Bucket: event.Records[0].s3.bucket.name,
              Key: event.Records[0].s3.object.key
            }).promise()

            // Upload JSON to DynamoDB
            const jsonData = JSON.parse(originalText.Body.toString('utf-8'))
            await ddbLoader(jsonData)

          } catch (err) {
            console.error(err)
          }
        })
      )
    }

    // Load JSON data to DynamoDB table
    const ddbLoader = async (data) => {
      // Separate into batches for upload
      let batches = []
      const BATCH_SIZE = 25

      while (data.length > 0) {
        batches.push(data.splice(0, BATCH_SIZE))
      }

      console.log(`Total batches: ${batches.length}`)

      let batchCount = 0

      // Save each batch
      await Promise.all(
        batches.map(async (item_data) => {

          // Set up the params object for the DDB call
          const params = {
            RequestItems: {}
          }
          params.RequestItems[ddbTable] = []
  
          item_data.forEach(item => {
            for (let key of Object.keys(item)) {
              // An AttributeValue may not contain an empty string
              if (item[key] === '') 
                delete item[key]
            }

            // Build params
            params.RequestItems[ddbTable].push({
              PutRequest: {
                Item: {
                  ...item
                }
              }
            })
          })

          // Push to DynamoDB in batches
          try {
            batchCount++
            console.log('Trying batch: ', batchCount)
            const result = await docClient.batchWrite(params).promise()
            console.log('Success: ', result)
          } catch (err) {
            console.error('Error: ', err)
          }
        })
      )
    }

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 cementblocks
Solution 2 RichVel