'Is there a way to get number of objects in a google cloud storage bucket using python?
I need to get the number of files in a bucket of GCS.
I don't want to use list_blobs to read them one by one and increase a counter.
Is there something like a metadata we can query?
I need to download all the files in the bucket and process them. now I want to do it using threads so I would need to separate files to groups somehow.
The idea was to use list_blobs with offset and size, but in order to do that I need to know the number of total files.
Any idea?
Thanks
Solution 1:[1]
There's no way to do a single metadata query to get the count. You could run a command like:
gsutil ls gs://my-bucket/** | wc -l
but note that this command is making a number of bucket listing requests behind the scenes - which can take a long time if the bucket is large, and will cost based on the number of operations it makes.
Solution 2:[2]
For those looking for a command line answer, you can use
gsutil du gs://pub | wc -l
Answering here as this was the first link I got when I searched. References: https://stackoverflow.com/a/18986955/6733421 https://cloud.google.com/storage/docs/gsutil/commands/du
Solution 3:[3]
I know the original question did not want to use .list_blobs() to count the number of files in a bucket, but since I didn't find a different way, I'm posting it here for reference, since it does work:
from google.cloud import storage
storage_client = storage.Client()
blobs_list = storage_client.list_blobs(bucket_or_name='name_of_your_bucket')
print(sum(1 for _ in blobs_list))
.list_blobs() returns an iterator, so this answer basically loops over the iterator and counts the elements.
If you only want to count the files within a certain folder in your bucket, you can use the prefix keyword:
blobs_list = storage_client.list_blobs(
bucket_or_name='name_of_your_bucket',
prefix='name_of_your_folder',
)
FYI: this question suggests a different method to solve this:
How can I get number of files from gs bucket using python
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Mike Schwartz |
| Solution 2 | Sai Chander |
| Solution 3 |
