'Suppress capturing progress bar updates into the log file [R]
I have a several functions performing some parallel looped operations on my input data (I use doParallel and foreach packages to do so). Within those functions I included progress bar using utils::txtProgressBar() to control the execution.
I decided to wrap those functions into the R package and to write a wrapper function which would allow to run an entire pipeline at once [explained below]. Inside this wrapper, I want to have a code section which would produce a log file. Below I paste the pseudo code which I hope would present the main idea.
Unfortunately with this approach, produced test log files are very large, because progress bars are captured and stored each time they refresh. I would like to save into the log file only the error message (if it would be thrown for some reason - my functions have assertions written with assertthat & checkmate) and the final message from each stage (produced by cat(); it is a statement which looks like this "XXX stage completed".
Below I am pasting some pseudo code with the main idea of the wrapper and the progress bars.
wrapper_function <- function(arg1, arg2, arg3, arg4, save_dir){
# console message
cat('Welcome to PKG_name', as.character(utils::packageVersion("PKG_name")), '\n',
'Pipeline initialized:', as.character(Sys.time()),'\n','\n')
# Create a log file
if (dir.exists(file.path(save_dir))) {
log_filename <- paste(format(Sys.time(), "%Y-%m-%d_%H-%M-%S"), "_PKG_name.log", sep = "")
log_filepath <- file.path(save_dir, log_filename, fsep = .Platform$file.sep)
log_file <- file(log_file_path, open = "a")
sink(log_file, append=TRUE, split = TRUE, type='output')
on.exit(sink(file=NULL, type = 'output'))
}
example_output <- example_looped_function(arg1, arg2, arg3, arg4)
example_output2 <- example_looped_function(example_output)
example_output3 <- example_looped_function(example_output2, arg1, arg2)
pipeline_output <- function_4(example_output2)
cat('Processing finished.')
cat('Thank you for using PKG_name')
log_file <- file(log_filepath, open = "a")
sink(log_file, append=TRUE, split = TRUE, type='output')
return(pipeline_output)
}
#Example of progress bar in functions:
pb <- utils::txtProgressBar(min = 0, max = length(index_list), style = 3, width = 50, char = "=")
for (indx in 1:length(index_list)){
example_loop <- foreach::foreach(something) %dopar% some_function(something, something_else)
utils::setTxtProgressBar(pb, indx)
}
close(pb)
I would be grateful for help in removing progress bar from the log files.
[explanation] there are some reasons why I would like to have both options included: to run the entire pipeline at once by launching wrapper function or to perform only a certain step of the analysis using particular function.
Solution 1:[1]
For those who may encounter similar problem in R - here is a solution I found:
- I rewrote the wrapper_function: instead of calling numerous functions one by one I just put the code executed by them directly
- I used redirected loop outputs to NULL file using sink.
Below is a pseudocode with the rationale:
wrapper_function <- function(some arguments, save_dir){
# console message
cat('Welcome to PKG_name', as.character(utils::packageVersion("PKG_name")), '\n',
'Pipeline initialized:', as.character(Sys.time()),'\n','\n')
# Create a log file
if (dir.exists(file.path(save_dir))) {
log_filename <- paste(format(Sys.time(), "%Y-%m-%d_%H-%M-%S"), "_PKG_name.log", sep = "")
log_filepath <- file.path(save_dir, log_filename, fsep = .Platform$file.sep)
log_file <- file(log_file_path, open = "a")
sink(log_file, append=TRUE, split = TRUE, type='output')
on.exit(sink(file=NULL, type = 'output'))
}
# start cluster
doParallel::registerDoParallel(cores = num_cores)
###################################################
# function with loop
###################################################
# header for progress bar
cat(paste('Doing something something...', '\n', sep=''))
#SINK #1 <- THIS DOES THE TRICK
sink(file=NULL, type = 'output')
close(log_file)
# progress bar
pb <- utils::txtProgressBar(min = 0, max = 10, style = 3, width = 50, char = "=")
# loop
for (x in 1:10){
#do something in parallel using %dopar%
}
close(pb)
#SINK #2 <- THIS DOES THE TRICK
log_file <- file(log_filepath, open = "a")
sink(log_file, append=TRUE, split = TRUE, type='output')
# after all of the functions with parallel computing
doParallel::stopImplicitCluster()
cat('Processing finished.')
cat('Thank you for using PKG_name')
log_file <- file(log_filepath, open = "a")
sink(log_file, append=TRUE, split = TRUE, type='output')
return(pipeline_output)
}
And the log file records the steps captured from cat() function without capturing the progress bar updates.
May not be the neatest and prettiest, but this works.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ramen |
