'Using rvest html_nodes() to store li elements for each item scrapped

I am trying to download some data, for example I can use the following:

  "https://www.fotocasa.es/es/comprar/viviendas/barcelona-capital/sagrada-familia/l/19/" %>% 
  read_html() %>% 
  html_nodes(".re-CardFeatures-wrapper")

With the following strucutre:

List of 2
 $ :List of 2
  ..$ node:<externalptr> 
  ..$ doc :<externalptr> 
  ..- attr(*, "class")= chr "xml_node"
 $ :List of 2
  ..$ node:<externalptr> 
  ..$ doc :<externalptr> 
  ..- attr(*, "class")= chr "xml_node"
 - attr(*, "class")= chr "xml_nodeset"

This corresponds to two properties from the website.

I am interested in extracting the items "li" from the lists

"https://www.fotocasa.es/es/comprar/viviendas/barcelona-capital/sagrada-familia/l/19/" %>% 
  read_html() %>% 
  html_nodes(".re-CardFeatures-wrapper") %>% 
  html_nodes("li")

Which gives:

{xml_nodeset (10)}
 [1] <li class="re-CardFeatures-feature">2 habs.</li>\n
 [2] <li class="re-CardFeatures-feature">1 baño</li>\n
 [3] <li class="re-CardFeatures-feature">60 m²</li>\n
 [4] <li class="re-CardFeatures-feature">3ª Planta</li>\n
 [5] <li class="re-CardFeatures-feature">Balcón</li>
 [6] <li class="re-CardFeatures-feature">3 habs.</li>\n
 [7] <li class="re-CardFeatures-feature">1 baño</li>\n
 [8] <li class="re-CardFeatures-feature">75 m²</li>\n
 [9] <li class="re-CardFeatures-feature">5ª Planta</li>\n
[10] <li class="re-CardFeatures-feature">Ascensor</li>

However, now, it has broken the "2 list" strucutre that I originally had (one for each property).

My question is, how can I extract the html_nodes() for the two properties but store them as they correspond to each given property?

i.e. the list should "break" after "3 hab" since this is the first item of the second property.



Solution 1:[1]

To get the "2 list" we can use lapply as follows,

library(dplyr)
library(rvest)
house = "https://www.fotocasa.es/es/comprar/viviendas/barcelona-capital/sagrada-familia/l/19/" %>% 
  read_html() %>% 
  html_nodes(".re-CardFeatures-wrapper") 


lis = lapply(house, function(x) x %>% html_nodes("li"))

Now we have lis with info of each property stored in different element of a list.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nad Pat