'Sagemaker charge from ListBucket

Looking at the breakdown of charges from AWS Sagemaker, I noticed only about 30% of total cost is from actually running the instances, surprisingly ~50 percent come from S3 (shows as ListBucket) and 20% for other overhead. I wonder if there is a way to decrease this massive extra charge from S3.

To give more background, I run hundreds of training jobs each roughly 3 hours long, and the data is hundreds of pickle files zipped into a tar.gz file of size ~10G (gets unzipped in the instance). So If I run 1000 jobs on instances with pricing $0.1/hr, I expect to see around $300 charge (1000 jobs * 3 hours * $0.1), however it ends up being close to $1000 with around $500 coming from "ListBucket"!! I wonder where this comes from, since the s3 folder with training data is simply a single zipped file, why would ListBucket cost so much?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source