'Why does it web scrape only few data even when the html_elements match all the relavant data on the web page
I am trying to web scrap the following website. willhaben
I used the following code to web scrap the address, surface, rooms, cost and href of each property from the list of 21 web pages.
library(rvest)
library(dplyr)
library(stringr)
links <- paste0("https://www.willhaben.at/iad/immobilien/eigentumswohnung/wien?page=", 200:220)
address_css <- c(rep(".kSOEKM .khvLsE", 21))
apt_addrs <- mapply(function(links, address_css) links %>% read_html() %>% html_elements(address_css) %>% html_text(), links, address_css)
cost_css <- c(rep(".eRKVmh", 21))
apt_cost <- mapply(function(links, cost_css) links %>% read_html() %>% html_elements(cost_css) %>% html_text(), links, cost_css)
surf_css <- c(rep(".iLQwFF:nth-child(1) .jXuiQ", 21))
apt_surf <- mapply(function(links, surf_css) links %>% read_html() %>% html_elements(surf_css) %>% html_text(), links, surf_css)
room_css <- c(rep(".iLQwFF+ .iLQwFF .jXuiQ", 21))
apt_rooms <- mapply(function(links, room_css) links %>% read_html() %>% html_elements(room_css) %>% html_text(), links, room_css)
href_css <- c(rep(".faMxZw", 21))
apt_href <- mapply(function(links, href_css) links %>% read_html() %>% html_elements(href_css) %>% html_text(), links, href_css)
But I get data, only from the first 5 properties on each page, sometimes only 4. I tried with different CSS classes, but still, the results were the same. The following pictures are examples of what I got.

I want to get data from all 25 apartments from each page. Thank you in advance.
Solution 1:[1]
The website uses JavaScript to load, you can extract the info using RSelenium
library(RSelenium)
library(rvest)
library(dplyr)
driver = rsDriver(browser = c("firefox"))
remDr <- driver[["client"]]
url = "https://www.willhaben.at/iad/immobilien/eigentumswohnung/wien?page=200"
#navigate to webpage
remDr$navigate(url)
#accept cookie
remDr$findElement(using = "xpath",'//*[@id="didomi-notice-agree-button"]')$clickElement()
#scroll step by step
webElem <- remDr$findElement("css", "body")
for (i in 1:9){
Sys.sleep(1)
webElem$sendKeysToElement(list(key = "page_down"))
}
#get prices
remDr$getPageSource()[[1]] %>%
read_html() %>%
html_nodes('.eRKVmh') %>%
html_text2()
[1] "€ 800.000" "€ 840.000" "€ 3.475.000" "€ 300.000" "€ 319.000,10" "€ 316.000,10" "€ 499.000" "€ 405.000" "€ 396.000" "€ 639.900"
[11] "€ 350.000" "€ 370.000" "€ 650.000" "€ 1.080.000" "€ 450.000" "€ 675.000" "€ 785.000" "€ 260.000" "€ 4.900.000" "€ 525.000"
[21] "€ 265.000" "€ 420.000" "€ 285.000" "€ 449.000"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Nad Pat |

