'Understanding continuation tokens in AWS S3

I'd like to understand better how continuation tokens work in list_objects_v2(). Here is a piece of code that iterates through a large S3 bucket, storing the continuation tokens provided:

def transformer():
    # some s3 client
    response = S3C.list_objects_v2(Bucket=BUCKET_NAME)
    tokens = []
    while True:
        if "NextContinuationToken" in response:
            token = response["NextContinuationToken"]
            tokens.append(token)
            response = S3C.list_objects_v2(Bucket=BUCKET_NAME, ContinuationToken=token)
        else:
            break
    print(tokens)

What is the structure of these tokens behind the hood? I noticed if i rerun the function they are re-generated (not the same.) Also: how would I grab the token indicating the starting point for the first API call? My motivation for understanding this is in the context of parallel computations - seeing if i can't grab these tokens and then ship them out somewhere as indices for computation and get a robust result. I'm a bit of a noob so thanks for being patient :)



Solution 1:[1]

Unfortunately it is not possible. S3 list operation is 100% sequential, i.e. you cannot parallel it.
BTW you still can do the trick, in case you need list objects in deep directory tree. Try to list one, or two (or any) levels deep in directory tree. And use each path received as base for another list request.

For ex.

/f1/f11/f111/obj.txt  
/f2/f22/f222/obj.txt  
/f2/f23/f233/obj.txt    

First list rq with depth=1 will give you two keys, /f1 and /f2 And then you can list each of them to process objects in parallel.

Hope this helps!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Alex