'Scrape a list of websites simultaneously using Rvest

I am trying to scrape multiple product catalogues and each link is the link towards a different product.

Webpages is a data frame containing the links.

webpages
"https............"
"https............"
"https............"

I have the following code:

for (i in webpages){
    book_page <- read_html(link) 
}

I got this error Error: x must be a string of length 1,

may I know how could I resolve it?



Solution 1:[1]

A for loop does not download multiple website at the same time as required by the title of your question. However, you can use a parallelization package e.g. pbmcapply:

library(rvest)
library(readr)
#> 
#> Attaching package: 'readr'
#> The following object is masked from 'package:rvest':
#> 
#>     guess_encoding
library(pbmcapply)
#> Loading required package: parallel

webpages <- list(
  "http://example.com",
  "https://stackoverflow.com/",
  "https://github.com/"
)

# download 3 webpages at the same time
contents <- pbmclapply(webpages, read_file, mc.cores = 3)
contents_html <- lapply(contents, read_html)
contents_html[[1]]
#> {html_document}
#> <html>
#> [1] <head>\n<title>Example Domain</title>\n<meta charset="utf-8">\n<meta http ...
#> [2] <body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is for use ...

Created on 2022-03-01 by the reprex package (v2.0.1)

read_html must be executed in the main thread to circumvent pointer errors.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1