'DynamoDB costs arising due to triggers
I have a workflow where I put files into an S3 bucket, which triggers a Lambda function. The Lambda function extracts some info about the file and inserts a row into a DynamoDB table for each file:
def put_filename_in_db(dynamodb, filekey, filename):
table = dynamodb.Table(dynamodb_table_name)
try:
response = table.put_item(
Item={
'masterclient': masterclient,
'filekey': filekey,
'filename': filename,
'filetype': filetype,
'source_bucket_name': source_bucket_name,
'unixtimestamp': unixtimestamp,
'processed_on': None,
'archive_path': None,
'archived_on': None,
}
)
except Exception as e:
raise Exception(f"Error")
return response
def get_files():
bucket_content = s3_client.list_objects(Bucket=str(source_bucket_name), Prefix=Incoming_prefix)['Contents']
file_list = []
for k, v in enumerate(bucket_content):
if (v['Key'].endswith("zip") and not v['Key'].startswith(Archive_prefix)):
filekey = v['Key']
filename = ...
dict = {"filekey": filekey, "filename": filename}
file_list.append(dict)
logger.info(f'Found {len(file_list)} files to process: {file_list}')
return file_list
def lambda_handler(event, context):
for current_item in get_files():
filekey = current_item['filekey']
filename = current_item['filename']
put_filename_in_db(dynamodb, filekey, filename)
return {
'statusCode': 200
}
This is how my DynamoDB table is defined in terraform:
resource "aws_dynamodb_table" "filenames" {
name = local.dynamodb_table_filenames
billing_mode = "PAY_PER_REQUEST"
#read_capacity = 10
#write_capacity = 10
hash_key = "filename"
stream_enabled = true
stream_view_type = "NEW_IMAGE"
attribute {
name = "filename"
type = "S"
}
}
resource "aws_lambda_event_source_mapping" "allow_dynamodb_table_to_trigger_lambda" {
event_source_arn = aws_dynamodb_table.filenames.stream_arn
function_name = aws_lambda_function.trigger_stepfunction_lambda.arn
starting_position = "LATEST"
}
New entries in the DynamoDB table trigger another Lambda function which contains this:
def parse_file_info_from_trigger(event):
filename = event['Records'][0]['dynamodb']['Keys']['filename']['S']
filetype = event['Records'][0]['dynamodb']['NewImage']['filetype']['S']
unixtimestamp = event['Records'][0]['dynamodb']['NewImage']['unixtimestamp']['S']
masterclient = event['Records'][0]['dynamodb']['NewImage']['masterclient']['S']
source_bucket_name = event['Records'][0]['dynamodb']['NewImage']['source_bucket_name']['S']
filekey = event['Records'][0]['dynamodb']['NewImage']['filekey']['S']
return filename, filetype, unixtimestamp, masterclient, source_bucket_name, filekey
def start_step_function(event, state_machine_zip_files_arn):
if event['Records'][0]['eventName'] == 'INSERT':
filename, filetype, unixtimestamp, masterclient, source_bucket_name, filekey = parse_file_info_from_trigger(event)
......
else:
logger.info(f'This is not an Insert event')
However, the costs for this process are extremely high. If I start testing with a single file loaded into S3, the overall DynamoDB costs for that day were $0.785. If I do it for around 50 files for a day, that would mean my total costs per day are 40$, which seems too high if we want to run the workflow on a daily basis.
Am I doing something wrong? Or is DynamoDB generally expensive? If it's the later, then what part exactly is costing so much? Or is it because put_filename_in_db is running in a loop?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

