'Progress bar in data.table aggregate action

ddply has a .progress to get a progress bar while it's running, is there an equivalent for data.table in R?



Solution 1:[1]

Following up on @jangorecki's excellent answer, here's a way to use a text progress bar:

library(data.table)
dt = data.table(a=1:4, b=c("a","b"))
grpn = uniqueN(dt$b)
pb <- txtProgressBar(min = 0, max = grpn, style = 3)
dt[, {setTxtProgressBar(pb, .GRP); Sys.sleep(0.5); sum(a)}, b]
close(pb)

Solution 2:[2]

Following up again on @jangorecki's great answer.

If you don't want to spam your terminal too much, you can make an external function equivalent to jangorecki's, but which does a modulus check and only prints if .GRP is divisible by a certain number "mod". Note, using the if function within the data.table curly-brackets itself doesn't work, which I assume is because if function's in R also use curly brackets.

progress = function(.GRP, grpn, mod) {
  if(!(.GRP %% mod)) {
  cat("progress", .GRP/grpn*100,"%\n")
  }
}

Then do. Here I use mod = 1000, so it would only print the percentage every 1000 groups.

dt[, {progress(.GRP, grpn, 1000); sum(a)}, b]

Solution 3:[3]

Following up on @jangorecki and other great answers, you can use the data.table symbol .NGRP instead of calculating grpn as in the other answers:

dt[, {cat("progress",.GRP/.NGRP*100,"%\n"); sum(a)}, b]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Zach
Solution 2 Eliot Behr
Solution 3 Eric Aya