'Quickly finding the size of an S3 'folder'

We have s3 'folders' (objects with a prefix under a bucket) with millions and millions of files and we want to figure out the size of these folders.

Writing my own .net application to get the lists of s3 objects was easy enough but the maximum number of keys per request is 1000, so it's taking forever.

Using S3Browser to look at a 'folder's' properties is taking a long time too. I'm guessing for the same reasons.

I've had this .NET application running for a week - I need a better solution.

Is there a faster way to do this?



Solution 1:[1]

The AWS CLI's ls command can do this: aws s3 ls --summarize --human-readable --recursive s3://$BUCKETNAME/$PREFIX --region $REGION

Solution 2:[2]

Seems like AWS added a menu item where it's possible to see the size:

size of S3 folder

Solution 3:[3]

I prefer using the AWSCLI. I find that the web console often times out when there are too many objects.

  • replace s3://bucket/ with where you want to start from.
  • relies on awscli, awk, tail, and some bash-like shell
start=s3://bucket/ && \
for prefix in `aws s3 ls $start | awk '{print $2}'`; do
  echo ">>> $prefix <<<"
  aws s3 ls $start$prefix --recursive --summarize | tail -n2
done

or in one line form:

start=s3://bucket/ && for prefix in `aws s3 ls $start | awk '{print $2}'`; do echo ">>> $prefix <<<"; aws s3 ls $start$prefix --recursive --summarize | tail -n2; done

Output looks something like:

$ start=s3://bucket/ && for prefix in `aws s3 ls $start | awk '{print $2}'`; do echo ">>> $prefix <<<"; aws s3 ls $start$prefix --recursive --summarize | tail -n2; done
>>> extracts/ <<<
Total Objects: 23
   Total Size: 10633858646
>>> hackathon/ <<<
Total Objects: 2
   Total Size: 10004
>>> home/ <<<
Total Objects: 102
   Total Size: 1421736087

Solution 4:[4]

If they're throttling you too 1000 keys per request, I'm not certain how PowerShell is going to help, but if you want to size of a bunch of folders, something like this should do it.

Save the following in a file called Get-FolderSize.ps1:

param
(
    [Parameter(Position=0, ValueFromPipeline=$True, Mandatory=$True)]
    [ValidateNotNullOrEmpty()]
    [System.String]
    $Path
)

function Get-FolderSize ($_ = (get-item .))  {
  Process {
    $ErrorActionPreference = "SilentlyContinue"
    #? { $_.FullName -notmatch "\\email\\?" }  <-- Exlcude folders.
    $length = (Get-ChildItem $_.fullname -recurse | Measure-Object -property length -sum).sum
    $obj = New-Object PSObject
    $obj | Add-Member NoteProperty Folder ($_.FullName)
    $obj | Add-Member NoteProperty Length ($length)
     Write-Output $obj
  }
}

Function Class-Size($size)
{

    IF($size -ge 1GB)
    {
        "{0:n2}" -f  ($size / 1GB) + " GB"
    }
    ELSEIF($size -ge 1MB)
    {
        "{0:n2}" -f  ($size / 1MB) + " MB"
    }
    ELSE
    {
        "{0:n2}" -f  ($size / 1KB) + " KB"
    }
}

Get-ChildItem $Path | Get-FolderSize | Sort-Object -Property Length -Descending | Select-Object -Property Folder, Length | Format-Table -Property Folder, @{ Label="Size of Folder" ; Expression = {Class-Size($_.Length)} }

Usage: .\Get-FolderSize.ps1 -Path \path\to\your\folders

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Foolish Brilliance
Solution 2 Filippo Loddo
Solution 3 debugme
Solution 4 Jeffrey Eldredge