'Scrape a list of websites simultaneously using Rvest
I am trying to scrape multiple product catalogues and each link is the link towards a different product.
Webpages is a data frame containing the links.
webpages
"https............"
"https............"
"https............"
I have the following code:
for (i in webpages){
book_page <- read_html(link)
}
I got this error Error: x must be a string of length 1,
may I know how could I resolve it?
Solution 1:[1]
A for loop does not download multiple website at the same time as required by the title of your question. However, you can use a parallelization package e.g. pbmcapply:
library(rvest)
library(readr)
#>
#> Attaching package: 'readr'
#> The following object is masked from 'package:rvest':
#>
#> guess_encoding
library(pbmcapply)
#> Loading required package: parallel
webpages <- list(
"http://example.com",
"https://stackoverflow.com/",
"https://github.com/"
)
# download 3 webpages at the same time
contents <- pbmclapply(webpages, read_file, mc.cores = 3)
contents_html <- lapply(contents, read_html)
contents_html[[1]]
#> {html_document}
#> <html>
#> [1] <head>\n<title>Example Domain</title>\n<meta charset="utf-8">\n<meta http ...
#> [2] <body>\n<div>\n <h1>Example Domain</h1>\n <p>This domain is for use ...
Created on 2022-03-01 by the reprex package (v2.0.1)
read_html must be executed in the main thread to circumvent pointer errors.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
