'How to fix subscript our of bounds error when scraping with Polite?
I'm trying to use library(Polite) to scrape terrific data from a website, but I'm receiving the "Error in ind_html[[1]] : subscript out of bounds". Here's what I'm doing:
library(tidyverse)
library(lubridate)
library(janitor)
library(rvest)
library(httr)
library(polite)
url <- "https://cew.georgetown.edu/cew-reports/roi2022/"
url_bow <- polite::bow(url)
url_bow
ind_html <-
polite::scrape(url_bow) %>%
rvest::html_nodes("table_div") %>%
rvest::html_table(fill = TRUE)
ind_tab <-
ind_html[[1]] %>%
make_clean_names()
ROI_TABLE <- ind_tab %>%
bind_rows() %>%
as_tibble()
I think the error has to do with ind_html[[1]] but I do not know how to fix it. Thank you for any help!
Solution 1:[1]
If you are trying to scrape the table below, we can do
df = read_csv('https://cewgeorgetown.github.io/collegeROI-2022/ROIforWeb0222.csv')
# A tibble: 4,419 x 45
Institution State Level `Predominant degr~ Control `10-year NPV ra~ `10-year NPV` `15-year NPV ra~ `15-year NPV` `20-year NPV ra~ `20-year NPV`
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Alaska Career Col~ AK 2-year Certificate Private f~ 2318 135000 2707 261000 2856 375000
2 Alaska Pacific Un~ AK 4-year Bachelor's Private n~ 3537 87000 2433 274000 1760 443000
3 Alaska Vocational~ AK Less tha~ Certificate Public 63 316000 240 458000 476 587000
4 University of Ala~ AK 4-year Bachelor's Public 2590 124000 1547 312000 1232 484000
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Nad Pat |
