'Dynamodb design recommendations for partial comparisons
I currently have a Dynamodb table with the following columns - Id (Partition Key), NativeTimestampAndPath (Range Key), EntryTimestamp (Attribute). No secondary indexes as of now.
Sample data looks like this:
Id (Partition Key) | NativeTimestampAndPath (Range Key) | EntryTimestamp
event1#type1#eventid1 | 1960-01-01T00:00:00Z#event/event_name=event_1/event_type=type=1/day=15/eventid1-file1.json | 2012-01-15T00:00:00Z
event1#type1#eventid1 | 1970-01-01T00:00:00Z#event/event_name=event_1/event_type=type=1/day=16/eventid1-file2.json | 2012-01-16T00:00:00Z
event2#type1#eventid2 | 1970-01-01T00:00:00Z#event/event_name=event_2/event_type=type=1/day=17/eventid2-file1.json | 2012-01-17T00:00:00Z
event2#type1#eventid2 | 2000-01-01T00:00:00Z#event/event_name=event_2/event_type=type=1/day=18/eventid2-file2.json | 2012-01-18T00:00:00Z
Required access patterns:
- Conditional insert into table if
NativeTimestampAndPathvalue does not already exist in the table for a givenId. - Search for all
NativeTimestampAndPathvalues for a givenId. - Search for all
NativeTimestampAndPathvalues for a givenIdthat areequal/greater than/greater than or equal/less/less than or equalto a given timestamp. For example, if my given timestamp is1970-01-01T00:00:00Zand the condition is to find allNativeTimestampAndPathcontaining timestamps less or equal to the given timestamps, the search query should return only the first 3 rows from the above table. - Search for all
NativeTimestampAndPathvalues for a givenIdand only return 1NativeTimestampAndPathvalue that contains the max/min timestamp of all.
In summary, patterns 3 and 4 require a comparison on just the timestamp part of NativeTimestampAndPath column values.
While implementing this comparison logic, I noticed that I got inaccurate results for equal/greater than/greater than or equal/less/less than or equal conditions because Dynamodb is comparing the given timestamp with the entire value (timestamp and path included) in the NativeTimestampAndPath column. Inaccurate results are as a result of lexical comparison instead of timestamp to timestamp comparison.
Code for the comparison logic:
DynamoDBQueryExpression<Event> queryExpression = new DynamoDBQueryExpression<Event> ();
queryExpression
.withHashKeyValues (event)
.withConsistentRead (false);
queryExpression.setLimit (500);
FilterType.Conditional operator = filterDto.getTimestampOperator (); // comparison condition string
Instant givenTimestamp = filterDto.getTimestamp (); // given timestamp
if (operator != null && givenTimestamp != null) {
Condition rangeKeyCondition = new Condition ();
switch (operator) {
case eq:
rangeKeyCondition.withComparisonOperator (ComparisonOperator.BEGINS_WITH);
break;
case gt:
rangeKeyCondition.withComparisonOperator (ComparisonOperator.GT);
break;
case ge:
rangeKeyCondition.withComparisonOperator (ComparisonOperator.GE);
break;
case lt:
rangeKeyCondition.withComparisonOperator (ComparisonOperator.LT);
break;
case le:
rangeKeyCondition.withComparisonOperator (ComparisonOperator.LE.toString ());
break;
}
rangeKeyCondition.withAttributeValueList (new AttributeValue ().withS (String.valueOf (givenTimestamp)));
queryExpression.withRangeKeyCondition ("NativeTimestampAndPath", rangeKeyCondition);
}
FilterType.Aggregate aggregate = filterDto.getTimestampAggregator (); //specifies if max or min is required
if (aggregate != null) {
queryExpression.setLimit (1);
switch (aggregate) {
case max:
queryExpression.setScanIndexForward (false);
break;
case min:
queryExpression.setScanIndexForward (true);
break;
}
}
QueryResultPage<Event> paginated;
List<Event> result = new ArrayList<> ();
do {
paginated = dbMapper.queryPage (Event.class, queryExpression);
result.addAll (paginated.getResults ());
queryExpression.setExclusiveStartKey (paginated.getLastEvaluatedKey ());
} while (paginated.getLastEvaluatedKey () != null && aggregate == null);
return result;
Assuming that I want to:
- Keep the
Idas the partition key and theNativeTimestampAndPathas the range key - And only use query expressions (vs scan expressions) for access patterns 3 and 4
What can I do to this Dynamodb design to meet the required access pattern conditions? Any suggestions are greatly appreciated. Thanks!
Solution 1:[1]
Right, the sort key is going to be treated as a string. Is 1970-01-01T00:00:00Z#event/event_name=event_1/event_type=type=1/day=16/eventid1-file2.json less than or equal to 1970-01-01T00:00:00Z? No, it's slightly later in a lexical sort. It will be impossible to directly include any equality expectations here if you have extra data after the timestamp.
Here's the trick. Is that long string less than 1970-01-01T00:00:00Z$? Yes! Because $ is a character above # in ASCII. Therefore, less than this value with the dollar sign would actually count as less than or equal. You can use ! for the other direction.
I'm not sure what your issue is with #4. You can specify a Limit and control direction with ScanIndexForward. Ties where the timestamps are the same will be kind of randomly chosen, but that's unrelated to the data model here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | hunterhacker |
