'Why does it web scrape only few data even when the html_elements match all the relavant data on the web page

I am trying to web scrap the following website. willhaben

I used the following code to web scrap the address, surface, rooms, cost and href of each property from the list of 21 web pages.

library(rvest)
library(dplyr)
library(stringr)


links <- paste0("https://www.willhaben.at/iad/immobilien/eigentumswohnung/wien?page=", 200:220)

address_css <- c(rep(".kSOEKM .khvLsE", 21))
apt_addrs <- mapply(function(links, address_css) links %>% read_html() %>% html_elements(address_css) %>% html_text(), links, address_css)

cost_css <- c(rep(".eRKVmh", 21))
apt_cost <- mapply(function(links, cost_css) links %>% read_html() %>% html_elements(cost_css) %>% html_text(), links, cost_css)

surf_css <- c(rep(".iLQwFF:nth-child(1) .jXuiQ", 21))
apt_surf <- mapply(function(links, surf_css) links %>% read_html() %>% html_elements(surf_css) %>% html_text(), links, surf_css)

room_css <- c(rep(".iLQwFF+ .iLQwFF .jXuiQ", 21))
apt_rooms <- mapply(function(links, room_css) links %>% read_html() %>% html_elements(room_css) %>% html_text(), links, room_css)

href_css <- c(rep(".faMxZw", 21))
apt_href <- mapply(function(links, href_css) links %>% read_html() %>% html_elements(href_css) %>% html_text(), links, href_css)

But I get data, only from the first 5 properties on each page, sometimes only 4. I tried with different CSS classes, but still, the results were the same. The following pictures are examples of what I got. enter image description here

href of the apartment

I want to get data from all 25 apartments from each page. Thank you in advance.



Solution 1:[1]

The website uses JavaScript to load, you can extract the info using RSelenium

library(RSelenium)
library(rvest)
library(dplyr)
driver = rsDriver(browser = c("firefox")) 
remDr <- driver[["client"]]

url = "https://www.willhaben.at/iad/immobilien/eigentumswohnung/wien?page=200"

#navigate to webpage
remDr$navigate(url)

#accept cookie
remDr$findElement(using = "xpath",'//*[@id="didomi-notice-agree-button"]')$clickElement()

#scroll step by step
webElem <- remDr$findElement("css", "body")

for (i in 1:9){
  Sys.sleep(1)
  webElem$sendKeysToElement(list(key = "page_down"))
}

#get prices
remDr$getPageSource()[[1]] %>% 
  read_html() %>% 
  html_nodes('.eRKVmh') %>% 
  html_text2()

[1] "€ 800.000"    "€ 840.000"    "€ 3.475.000"  "€ 300.000"    "€ 319.000,10" "€ 316.000,10" "€ 499.000"    "€ 405.000"    "€ 396.000"    "€ 639.900"   
[11] "€ 350.000"    "€ 370.000"    "€ 650.000"    "€ 1.080.000"  "€ 450.000"    "€ 675.000"    "€ 785.000"    "€ 260.000"    "€ 4.900.000"  "€ 525.000"   
[21] "€ 265.000"    "€ 420.000"    "€ 285.000"    "€ 449.000"   

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nad Pat