'Why is my loop not working when I webscrape on R?

library(tidyverse)
library(rvest)
library(htmltools)
library(xml2)
library(dplyr)

#import using read_html

results <- read_html("https://www.artemis.bm/deal-directory/")

#Results are in a list - can look at list by running 'results' in the console

CODE - The issuers information extracted

issuers <- results %>% html_nodes("#table-deal a") %>% html_text()
cedent <- results %>% html_nodes("td:nth-child(2)") %>% html_text()
risks <- results %>% html_nodes("td:nth-child(3)") %>% html_text()
size <- results %>% html_nodes("td:nth-child(4)") %>% html_text()
date <- results %>% html_nodes("td:nth-child(5)") %>% html_text()

#This scrapes all of the links for each issuer page

url <- results %>% html_nodes("#table-deal a") %>% html_attr("href")

#getting data from within the links

get_placement = function(url_link) {
  url_link = read_html("https://www.artemis.bm/deal-directory/cape-lookout-re-ltd-series-2022-1/")
  issuer_page = read_html(url_link)
  placement = issuer_page %>% html_nodes("#info-box li:nth-child(3)") %>%
    html_text()
}

This code works and the last bit from get_placement gets the information I am after (the placement section) - whichever link I put in it gives me the placement for that informational. However, when i try and loop it it does not work

#here is my issue

get_placement = function(url_link) {
  issuer_page = read_html(url_link)
  placement = issuer_page %>% html_nodes("#info-box li:nth-child(3)") %>%
    html_text()
  return(placement)
}

This only gives me one value when I need the placement information from all 833?

issuer_placement = sapply(url, FUN = get_placement)

When I try to use sapply I get this message

Browse[1]> issuer_placement = sapply(url, FUN = get_placement)
Error during wrapup: no applicable method for 'read_xml' applied to an object of class "name"
Error: no more error handlers available (recursive errors?); invoking 'abort' restart

and

function (con, open = "r", blocking = TRUE, ...) 
.Internal(open(con, open, blocking))


Solution 1:[1]

This worked for me without any problem

issue_placement <- lapply(url, function(u) {
  tryCatch(return(get_placement(u)),
           error=function(e) return("Not retrieved - error"),
           warning=function(w) return("Not retrieved - warning"))
})

When I pushed issue_placement into a data.table (see below), I found found 330 unique results, and no errors/warnings

data.table::data.table(placement = unlist(issue_placement))[,.N, placement]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 langtang