'How to save multiples match in one column? rvest, R and stringr

This question is a sequence to the problem stackoverflow

I have these two example html: url1.html ; url2.html

The url3.html is another example with more IPC

In URL2.html there is no information (51) and in URL1.html there is.

I'm using this code in R:

library(rvest)
library(tidyverse)
library(stringr)

x<-data.frame(
    URL=c(1:2),
    page=c(paste(readLines("url1.html"), collapse="\n"),
                 paste(readLines("url2.html"), collapse="\n"))
) 

for (i in 1:nrow(x)){
    html<-x$page[i]%>% unclass() %>% unlist()
    read_html(html,encoding = "ISO-8859-1") %>% 
        rvest::html_elements(xpath = '//*[@id="principal"]/table[2]') %>%
        html_nodes(xpath='//div[@id="classificacao0"]') %>%  
        html_text(trim=T)%>%  
        str_replace_all(.,"[\\n\\r\\t]+", "")%>%
        stringr::str_trim( ) -> tmp
    
    if(length(tmp) == 0) tmp <- "ND"
    x$ipc_0[i] <- tmp %>% str_replace_all(.,"\\s+", " ") %>% str_replace_all(.," \\)", "\\)")
}

for (i in 1:nrow(htm_temp)){
    html<-x$page[i]%>% unclass() %>% unlist()
    read_html(html,encoding = "ISO-8859-1") %>% 
        rvest::html_elements(xpath = '//*[@id="principal"]/table[2]') %>%
        html_nodes(xpath='//div[@id="classificacao1"]') %>%  
        html_text(trim=T)%>%  
        str_replace_all(.,"[\\n\\r\\t]+", "")%>%
        stringr::str_trim( ) -> tmp
    
    if(length(tmp) == 0) tmp <- "ND"
    x$ipc_1[i] <- tmp %>% str_replace_all(.,"\\s+", " ") %>% str_replace_all(.," \\)", "\\)")
}

Result: partially correct

Output

Desired result:create a new dataframe with the following structure.

URL IPC
1 B62B 1/16 (1968.09)...
1 B62B 1/00 (1968.09)...
2 ND

Problem: There are url`s that have the code (51) and others that do not. When you have the code (51) the structure can contain "n" id with the following structure xpath='//div[@id="classificacao0"]. the Rating Id can contain values from 0 to "n". How to optimize this code to capture the necessary information without having to do a lot of for (variable in vector) for each "n"?

Any idea how to solve this problem?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source