'Getting S3 objects' last modified datetimes with boto
I'm writing a Python script that uploads files to S3 using boto librairy. I only want to upload changed files (which I can check by their "last modified" datetimes), but I can't find the Boto API endpoint to get the last modified date.
Solution 1:[1]
Here's a snippet of Python/boto code that will print the last_modified attribute of all keys in a bucket:
>>> import boto
>>> s3 = boto.connect_s3()
>>> bucket = s3.lookup('mybucket')
>>> for key in bucket:
print key.name, key.size, key.last_modified
index.html 13738 2012-03-13T03:54:07.000Z
markdown.css 5991 2012-03-06T18:32:43.000Z
>>>
Solution 2:[2]
Boto3 returns a datetime object for LastModified when you use the the (S3) Object python object:
You shouldn't need to perform any tortuous string manipulations.
To compare LastModified to today's date (Python3):
import boto3
from datetime import datetime, timezone
today = datetime.now(timezone.utc)
s3 = boto3.client('s3', region_name='eu-west-1')
objects = s3.list_objects(Bucket='my_bucket')
for o in objects["Contents"]:
if o["LastModified"] == today:
print(o["Key"])
You just need to be aware that LastModifed is timezone aware, so any date you compare with it must also be timezone aware, hence:
datetime.now(timezone.utc)
Solution 3:[3]
this is working (tnx to jdennison from above):
after getting the key from s3:
import time
from time import mktime
from datetime import datetime
modified = time.strptime(key.last_modified, '%a, %d %b %Y %H:%M:%S %Z')
#convert to datetime
dt = datetime.fromtimestamp(mktime(modified))
Solution 4:[4]
For just one s3 object you can use boto client's head_object() method which is faster than list_objects_v2() for one object as less content is returned. The returned value is datetime similar to all boto responses and therefore easy to process.
head_object() method comes with other features around modification time of the object which can be leveraged without further calls after list_objects() result.
import boto3
s3 = boto3.client('s3')
response = client.head_object(Bucket, Key)
datetime_value = response["LastModified"]
Solution 5:[5]
Convert the last_modified attribute to struct_time as given below
import time
for key in bucket.get_all_keys():
time.strptime(key.last_modified[:19], "%Y-%m-%dT%H:%M:%S")
This will give a time.struct_time(tm_year, tm_mon, tm_mday, tm_hour, tm_min, tm_sec, tm_wday, tm_yday, tm_isdst) tuple for each key in the S3 bucket
Solution 6:[6]
If you're using Django and django-storages, you can an unofficial API in the s3boto backend:
>>> from storages.backends.s3boto import _parse_datestring
>>> _parse_datestring("Fri, 20 Jul 2012 16:57:27 GMT")
datetime.datetime(2012, 7, 21, 2, 57, 27)
Unfortunately as of django-storages 1.1.5, this gives a naive datetime. You need to use django.utils.timezone to create an aware version:
>>> from django.utils import timezone
>>> naive = _parse_datestring("Fri, 20 Jul 2012 16:57:27 GMT")
>>> timezone.make_aware(naive, timezone.get_current_timezone())
datetime.datetime(2012, 7, 21, 2, 57, 27, tzinfo=<DstTzInfo 'Australia/Brisbane' EST+10:00:00 STD>)
Solution 7:[7]
Using a Resource, you can get an iterator of all objects and then retrieve the last_modified attribute of an ObjectSummary.
import boto3
s3 = boto3.resource('s3')
bk = s3.Bucket(bucket_name)
[obj.last_modified for obj in bk.objects.all()][:10]
returns
[datetime.datetime(2020, 4, 17, 13, 23, 37, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 37, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 38, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 38, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 38, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 37, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 23, 37, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 17, 13, 20, 20, tzinfo=tzlocal()),
datetime.datetime(2020, 4, 20, 8, 30, 2, tzinfo=tzlocal()),
datetime.datetime(2020, 3, 26, 15, 33, 58, tzinfo=tzlocal())]
Solution 8:[8]
This is for recent s3 list_objectsv2. The boto3 client gives lastModifed in datetime.datetime format, and ways to convert it is as below
links: boto3 link
and
aws s3 listobj
import datetime
from dateutil.tz import tzutc
# node s3 response '2019-06-17T18:42:57.000Z'
# python boto3 s3 response datetime.datetime(2019, 10, 1, 22, 41, 55, tzinfo=tzutc())
''' {'ETag': '"c8ba0ad5003832f63690ea8ff9b66052"',
'Key': 'SOMEFILE',
'LastModified': datetime.datetime(2019, 10, 2, 18, 50, 47, tzinfo=tzutc()),
'Size': 6390623,
'StorageClass': 'STANDARD'}
'''
l = datetime.datetime(2019, 10, 1, 22, 41, 55, tzinfo=tzutc())
get_last_modified = int(l.strftime('%s'))
print(l)
print(get_last_modified)
Solution 9:[9]
import boto3
from boto3.session import Session
session = Session(aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')
my_bucket = s3.Bucket(BUCKET_NAME)
for obj in my_bucket.objects.all():
print('{} | {}'.format(obj.key, obj.last_modified))
Solution 10:[10]
You can get last object last modified date like that:
With resource
boto3.resource('s3').Object(<BUCKET_NAME>, <file_path>).last_modified
With client
boto3.client('s3').head_object(<BUCKET_NAME>, <file_path>)['LastModified']
Solution 11:[11]
You can try sorting the returned list of objects by LastModified key
import boto3
s3_client = boto3.client('s3')
s3_response = s3_client.list_objects(Bucket=BUCKET_NAME)
sorted_contents = sorted(s3_response['Contents'], key=lambda d: d['LastModified'], reverse=True)
sorted_contents[0].get('Key')
You can remove reverse=True flag in order to get the earliest modified object. You can also sort by Size of the objects or any other properties you want.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
