'How can I import data from AWS s3 to dynamoDB?
How can I import data from AWS s3 from the public data set This link, this is a public dataset to dynamoDB?
I have tried many ways to import the data, aws pipeline, aws athena, none of them worked. I also tried using the node.js to import the data, it did not work. I also downloaded the public dataset into my laptop, but I can not find a import button in the dynamoDB website.
Could you guys recommend an efficient and less cost way to import date from the s3 to dynamoDB.
Thanks!
Solution 1:[1]
Write a custom script to download the data and insert it record by record into DynamoDB. You can use the batchWriteItem api to insert multiple records (up to 25) in a single api call. Note this still consumes 1 write capacity unit per record inserted (assuming records are < 1k).
AWS Data Migration service can also do this for you, but your own script will cost nothing (outside of DynamoDB provisioned writes).
Solution 2:[2]
This worked for me pushing 90k records.
const AWS = require('aws-sdk')
AWS.config.region = process.env.AWS_REGION
const s3 = new AWS.S3()
const docClient = new AWS.DynamoDB.DocumentClient()
const ddbTable = "your-table"
// The Lambda handler
exports.handler = async (event) => {
console.log (JSON.stringify(event, null, 2))
console.log('Using DDB table: ', ddbTable)
await Promise.all(
event.Records.map(async (record) => {
try {
console.log('Incoming record: ', record)
// Get original text from object in incoming event
const originalText = await s3.getObject({
Bucket: event.Records[0].s3.bucket.name,
Key: event.Records[0].s3.object.key
}).promise()
// Upload JSON to DynamoDB
const jsonData = JSON.parse(originalText.Body.toString('utf-8'))
await ddbLoader(jsonData)
} catch (err) {
console.error(err)
}
})
)
}
// Load JSON data to DynamoDB table
const ddbLoader = async (data) => {
// Separate into batches for upload
let batches = []
const BATCH_SIZE = 25
while (data.length > 0) {
batches.push(data.splice(0, BATCH_SIZE))
}
console.log(`Total batches: ${batches.length}`)
let batchCount = 0
// Save each batch
await Promise.all(
batches.map(async (item_data) => {
// Set up the params object for the DDB call
const params = {
RequestItems: {}
}
params.RequestItems[ddbTable] = []
item_data.forEach(item => {
for (let key of Object.keys(item)) {
// An AttributeValue may not contain an empty string
if (item[key] === '')
delete item[key]
}
// Build params
params.RequestItems[ddbTable].push({
PutRequest: {
Item: {
...item
}
}
})
})
// Push to DynamoDB in batches
try {
batchCount++
console.log('Trying batch: ', batchCount)
const result = await docClient.batchWrite(params).promise()
console.log('Success: ', result)
} catch (err) {
console.error('Error: ', err)
}
})
)
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | cementblocks |
| Solution 2 | RichVel |
