'Progress Bar inside a map function R - Web Scraping

I have been trying to include a progress bar inside a map function when doing a web scraping.

First, I collect all the links, which bring the results within seconds.

library(rvest)
library(dplyr)
library(stringr)
library(purrr)

news_america_mg_01 <- paste0("https://www.americamineiro.com.br/paginas/page/", 
                                 seq(from = 1, to = 4)) %>% 
  map(. %>% 
        read_html() %>% 
        html_nodes(".gdlr-blog-title a") %>% 
        html_attr("href") %>% 
        as.data.frame())

Second, and this is where I want to include a progress bar, I extract information of the links collected from the website.

news_america_mg_02 <- news_america_mg_01 %>%
  map(. %>% 

        #Title
        mutate(title = map_chr(., ~ read_html(.x) %>%
                                          html_node("h1.gdlr-blog-title.entry-title") %>%
                                          html_text()),
               #Date
               data = map_chr(., ~ read_html(.x) %>%
                                        html_node(".gdlr-info .updated a") %>%
                                        html_text()),
               #Text
               text = map_chr(., ~ read_html(.x) %>%
                                 html_node(".size-large+ p") %>%
                                 html_text())))

Thanks in advance!!



Solution 1:[1]

Create a wrapper around purrr:map_chr() with one of the progress bar options. Credit: James Atkin's post

map_chr_progress <- function(.x, .f, ..., .id = NULL) {
  .f <- purrr::as_mapper(.f, ...)
  pb <- progress::progress_bar$new(total = length(.x), format = " [:bar] :current/:total (:percent) eta: :eta", force = TRUE)
  
  f <- function(...) {
    pb$tick()
    .f(...)
  }
  purrr::map_chr(.x, f, ..., .id = .id)
}

Then you can use that in your dplyr chain.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jeff Parker