'Why is my loop not working when I webscrape on R?
library(tidyverse)
library(rvest)
library(htmltools)
library(xml2)
library(dplyr)
#import using read_html
results <- read_html("https://www.artemis.bm/deal-directory/")
#Results are in a list - can look at list by running 'results' in the console
CODE - The issuers information extracted
issuers <- results %>% html_nodes("#table-deal a") %>% html_text()
cedent <- results %>% html_nodes("td:nth-child(2)") %>% html_text()
risks <- results %>% html_nodes("td:nth-child(3)") %>% html_text()
size <- results %>% html_nodes("td:nth-child(4)") %>% html_text()
date <- results %>% html_nodes("td:nth-child(5)") %>% html_text()
#This scrapes all of the links for each issuer page
url <- results %>% html_nodes("#table-deal a") %>% html_attr("href")
#getting data from within the links
get_placement = function(url_link) {
url_link = read_html("https://www.artemis.bm/deal-directory/cape-lookout-re-ltd-series-2022-1/")
issuer_page = read_html(url_link)
placement = issuer_page %>% html_nodes("#info-box li:nth-child(3)") %>%
html_text()
}
This code works and the last bit from get_placement gets the information I am after (the placement section) - whichever link I put in it gives me the placement for that informational. However, when i try and loop it it does not work
#here is my issue
get_placement = function(url_link) {
issuer_page = read_html(url_link)
placement = issuer_page %>% html_nodes("#info-box li:nth-child(3)") %>%
html_text()
return(placement)
}
This only gives me one value when I need the placement information from all 833?
issuer_placement = sapply(url, FUN = get_placement)
When I try to use sapply I get this message
Browse[1]> issuer_placement = sapply(url, FUN = get_placement)
Error during wrapup: no applicable method for 'read_xml' applied to an object of class "name"
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
and
function (con, open = "r", blocking = TRUE, ...)
.Internal(open(con, open, blocking))
Solution 1:[1]
This worked for me without any problem
issue_placement <- lapply(url, function(u) {
tryCatch(return(get_placement(u)),
error=function(e) return("Not retrieved - error"),
warning=function(w) return("Not retrieved - warning"))
})
When I pushed issue_placement into a data.table (see below), I found found 330 unique results, and no errors/warnings
data.table::data.table(placement = unlist(issue_placement))[,.N, placement]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | langtang |
