'How to fix subscript our of bounds error when scraping with Polite?

I'm trying to use library(Polite) to scrape terrific data from a website, but I'm receiving the "Error in ind_html[[1]] : subscript out of bounds". Here's what I'm doing:

library(tidyverse)
library(lubridate)
library(janitor)
library(rvest)
library(httr)
library(polite)

url <- "https://cew.georgetown.edu/cew-reports/roi2022/"
url_bow <- polite::bow(url)
url_bow
ind_html <-
  polite::scrape(url_bow) %>%  
  rvest::html_nodes("table_div") %>% 
  rvest::html_table(fill = TRUE) 
ind_tab <- 
  ind_html[[1]] %>% 
  make_clean_names()

ROI_TABLE <- ind_tab %>%
  bind_rows() %>%
  as_tibble()

I think the error has to do with ind_html[[1]] but I do not know how to fix it. Thank you for any help!



Solution 1:[1]

If you are trying to scrape the table below, we can do

df = read_csv('https://cewgeorgetown.github.io/collegeROI-2022/ROIforWeb0222.csv')

# A tibble: 4,419 x 45
   Institution        State Level     `Predominant degr~ Control    `10-year NPV ra~ `10-year NPV` `15-year NPV ra~ `15-year NPV` `20-year NPV ra~ `20-year NPV`
   <chr>              <chr> <chr>     <chr>              <chr>                 <dbl>         <dbl>            <dbl>         <dbl>            <dbl>         <dbl>
 1 Alaska Career Col~ AK    2-year    Certificate        Private f~             2318        135000             2707        261000             2856        375000
 2 Alaska Pacific Un~ AK    4-year    Bachelor's         Private n~             3537         87000             2433        274000             1760        443000
 3 Alaska Vocational~ AK    Less tha~ Certificate        Public                   63        316000              240        458000              476        587000
 4 University of Ala~ AK    4-year    Bachelor's         Public                 2590        124000             1547        312000             1232        484000

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nad Pat