'Compute the size of directory in R

I want to compute the size of a directory in R. I tried to use the list.info function, by unfortunably that follows the symbolic links so my results are biased:

# return wrong size, with duplicate counts for symlinks
sum(file.info(list.files(path = '/my/directory/', recursive = T, full.names = T))$size)

How do I compute the file size of a directory, so that it gives me the same result as on Linux, e.g. with du -s for example?

Thanks



Solution 1:[1]

system('powershell -noprofile -command "ls -r|measure -s Length"')

References:

  1. https://technet.microsoft.com/en-us/library/ff730945.aspx
  2. Get Folder Size from Windows Command Line
  3. https://stat.ethz.ch/R-manual/R-devel/library/base/html/system.html
  4. https://superuser.com/questions/217773/how-can-i-check-the-actual-size-used-in-an-ntfs-directory-with-many-hardlinks

You can also leverage cygwin if you have it; this lets you use Linux commands and get comparable results. Further there's a nice solution using Sysinternals in the last link I gave above.

Solution 2:[2]

I finally used this:

system('du -s')

Solution 3:[3]

Healthy solution, might be very useful for checking a package size.

dir_size <- function(path, recursive = TRUE) {
  stopifnot(is.character(path))
  files <- list.files(path, full.names = T, recursive = recursive)
  vect_size <- sapply(files, function(x) file.size(x))
  size_files <- sum(vect_size)
  size_files
}

cat(dir_size(find.package("Rcpp"))/10**6, "MB")
#> 14.81649 MB

Created on 2021-06-26 by the reprex package (v2.0.0)

Solution 4:[4]

"file.size" return the actual size, size on disk is the actual amount of space being taken up on the disk. check this to understand the difference . https://superuser.com/questions/66825/what-is-the-difference-between-size-and-size-on-disk try this for size of all files:

 files<-list.files(path_of_directory,full.names = T)
 vect_size <- sapply(files, file.size)
 size_files <- sum(vect_size)

Solution 5:[5]

Recently, I have deal with this problem and here is my code:

library(pacman)
p_load(fs,tidyfst)

sys_time_print({
  dir_info(your_directory_path) -> your_dir_info
})

your_dir_info %>% 
  summarise_dt(size = sum(size,na.rm = T))

When I first run the code above, it takes about 3min to track 52G files (in 174,731 separate files). Later when I run again, it takes shorter than 6s. This is amazing.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Community
Solution 2 Carmellose
Solution 3
Solution 4 islem
Solution 5 Hope