'Dynamodb design recommendations for partial comparisons

I currently have a Dynamodb table with the following columns - Id (Partition Key), NativeTimestampAndPath (Range Key), EntryTimestamp (Attribute). No secondary indexes as of now.

Sample data looks like this:

Id (Partition Key)         |     NativeTimestampAndPath (Range Key)                             |     EntryTimestamp
event1#type1#eventid1      |     1960-01-01T00:00:00Z#event/event_name=event_1/event_type=type=1/day=15/eventid1-file1.json    |   2012-01-15T00:00:00Z  
event1#type1#eventid1      |     1970-01-01T00:00:00Z#event/event_name=event_1/event_type=type=1/day=16/eventid1-file2.json    |   2012-01-16T00:00:00Z  
event2#type1#eventid2      |     1970-01-01T00:00:00Z#event/event_name=event_2/event_type=type=1/day=17/eventid2-file1.json    |   2012-01-17T00:00:00Z  
event2#type1#eventid2      |     2000-01-01T00:00:00Z#event/event_name=event_2/event_type=type=1/day=18/eventid2-file2.json    |   2012-01-18T00:00:00Z 

Required access patterns:

  1. Conditional insert into table if NativeTimestampAndPath value does not already exist in the table for a given Id.
  2. Search for all NativeTimestampAndPath values for a given Id.
  3. Search for all NativeTimestampAndPath values for a given Id that are equal/greater than/greater than or equal/less/less than or equal to a given timestamp. For example, if my given timestamp is 1970-01-01T00:00:00Z and the condition is to find all NativeTimestampAndPath containing timestamps less or equal to the given timestamps, the search query should return only the first 3 rows from the above table.
  4. Search for all NativeTimestampAndPath values for a given Id and only return 1 NativeTimestampAndPath value that contains the max/min timestamp of all.

In summary, patterns 3 and 4 require a comparison on just the timestamp part of NativeTimestampAndPath column values.

While implementing this comparison logic, I noticed that I got inaccurate results for equal/greater than/greater than or equal/less/less than or equal conditions because Dynamodb is comparing the given timestamp with the entire value (timestamp and path included) in the NativeTimestampAndPath column. Inaccurate results are as a result of lexical comparison instead of timestamp to timestamp comparison.

Code for the comparison logic:

DynamoDBQueryExpression<Event> queryExpression = new DynamoDBQueryExpression<Event> ();
queryExpression
        .withHashKeyValues (event)
        .withConsistentRead (false);
queryExpression.setLimit (500);
FilterType.Conditional operator = filterDto.getTimestampOperator (); // comparison condition string
Instant                givenTimestamp  = filterDto.getTimestamp (); // given timestamp
if (operator != null && givenTimestamp != null) {
    Condition rangeKeyCondition = new Condition ();
    switch (operator) {
        case eq:
            rangeKeyCondition.withComparisonOperator (ComparisonOperator.BEGINS_WITH);
            break;
        case gt:
            rangeKeyCondition.withComparisonOperator (ComparisonOperator.GT);
            break;
        case ge:
            rangeKeyCondition.withComparisonOperator (ComparisonOperator.GE);
            break;
        case lt:
            rangeKeyCondition.withComparisonOperator (ComparisonOperator.LT);
            break;
        case le:
            rangeKeyCondition.withComparisonOperator (ComparisonOperator.LE.toString ());
            break;
    }
    rangeKeyCondition.withAttributeValueList (new AttributeValue ().withS (String.valueOf (givenTimestamp)));
    queryExpression.withRangeKeyCondition ("NativeTimestampAndPath", rangeKeyCondition);
}
FilterType.Aggregate aggregate = filterDto.getTimestampAggregator (); //specifies if max or min is required
if (aggregate != null) {
    queryExpression.setLimit (1);
    switch (aggregate) {
        case max:
            queryExpression.setScanIndexForward (false);
            break;
        case min:
            queryExpression.setScanIndexForward (true);
            break;
    }
}
QueryResultPage<Event> paginated;
List<Event>            result = new ArrayList<> ();
do {
    paginated = dbMapper.queryPage (Event.class, queryExpression);
    result.addAll (paginated.getResults ());
    queryExpression.setExclusiveStartKey (paginated.getLastEvaluatedKey ());
} while (paginated.getLastEvaluatedKey () != null && aggregate == null);
return result;

Assuming that I want to:

  • Keep the Id as the partition key and the NativeTimestampAndPath as the range key
  • And only use query expressions (vs scan expressions) for access patterns 3 and 4

What can I do to this Dynamodb design to meet the required access pattern conditions? Any suggestions are greatly appreciated. Thanks!



Solution 1:[1]

Right, the sort key is going to be treated as a string. Is 1970-01-01T00:00:00Z#event/event_name=event_1/event_type=type=1/day=16/eventid1-file2.json less than or equal to 1970-01-01T00:00:00Z? No, it's slightly later in a lexical sort. It will be impossible to directly include any equality expectations here if you have extra data after the timestamp.

Here's the trick. Is that long string less than 1970-01-01T00:00:00Z$? Yes! Because $ is a character above # in ASCII. Therefore, less than this value with the dollar sign would actually count as less than or equal. You can use ! for the other direction.

I'm not sure what your issue is with #4. You can specify a Limit and control direction with ScanIndexForward. Ties where the timestamps are the same will be kind of randomly chosen, but that's unrelated to the data model here.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 hunterhacker