'R - Scrape a page and display a list of items

Please help me scrape the page,

https://support.sas.com/en/papers/proceedings-archive/sugi2005.html

to display the list of Paper names. See below what I have so far, but I do not know the class to use to achieve this. Thanks for helping!

pagetoread <- read_html("http://www2.sas.com/proceedings/sugi30/toc.html",n=500)

get_paper_names <- function(html){
  html %>% 
    html_nodes('.?????') %>% 
    html_text() %>% 
    str_trim() %>% 
    unlist()
}

get_paper_names(pagetoread)


Solution 1:[1]

We can do,

'https://support.sas.com/en/papers/proceedings-archive/sugi2005.html' %>% read_html() %>% 
  html_nodes('#par > div > div:nth-child(3) > div > div.par.parsys > div > div') %>% html_nodes('p') %>% html_nodes('cite') %>% 
  html_text()

 [1] "PROC FORMAT ? Not Just Another Pretty Face"                                                                                                                                           
  [2] "Efficiency Considerations Using the SAS System"                                                                                                                                       
  [3] "Through the Looking Glass: Two Windows into SAS"                                                                                                                                      
  [4] "Journeymen's Tools: Two Macros, ProgList and PutMvars, to Show Calling Sequence and Parameters of Routines"                                                                           
  [5] "SAS Batch Portal Data Collection and Real-Time Monitoring Application of SAS Batch Jobs"                                                                                              
  [6] "How to Get SAS Data into PowerPoint with SAS9" 

First used JS path of highlighted class, then as the papers name are located under node cite which in turn are nested p nodes. enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1