'Is there an abstract pattern I can use to web scrape journal abstracts in rvest?

I am new to web scraping so please forgive me if what I am looking for is not possible. I want to extract all the journal article abstracts from a large database.

I am able to generate all the links from the database.

pangiaoDB <- read_html('https://panglaodb.se/papers.html')
table <- pangiaoDB %>% 
 html_node(xpath = '/html/body/div[2]/div[2]/table') %>% 
 html_table()
url <- lapply(table$DOI, function(x) {
    paste('https://doi.org/', x, sep = '')
})
head(url)

The table has over 800 unique journals.

length(unique(table$Journal))
length(table$Journal)

The abstracts are tucked away in various ways but for the most part I have found them in xpath = '//*[@id="3179475"]/section') and xpath = '//*[@id="Abs1"]. The ladder is less of an issue but how can I generate a relative xpath for abstracts in the former path?



Solution 1:[1]

Here is the code I ended up using after getting feedback from @Axeman:

data <- rcrossref::cr_abstract(str_remove(url, pattern = 'https://doi.org/')) %>%
                          str_remove(., pattern = 'Abstract')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Noah_Seagull