'Need help finding correct html attributes for scraping code
I want to scrape prices from a website, but I'm not sure how to properly select the right html node and attribute (or text).
So far, the code (that has worked with small adjustments for other websites) looks like this: vec_microspot<-vector()
i=0
for (j in input_microspot$`Microspot Artikel`) {
Sys.sleep(runif(1, min=0.25, max=0.5))
i<-i+1
vec_microspot[i] <- try(paste0('https://www.microspot.ch/',j)%>%
read_html %>%
html_nodes('span') %>%
html_attr('price'))
}
the j in the code refers to the product Nr. that is then pasted onto the base URL. Example product numbers are 0002708143 and 0001560873. So the links are e.g. https://www.microspot.ch/0002708143 and https://www.microspot.ch/0001560873
Or is it not possible to scrape prices from this website as the html_attr or html_text is different for every product?
Solution 1:[1]
You want the innerText of the target node which can be extracted with html_text. You need only a single node which, using the up-to-date syntax, can be returned by html_element (singular). Finally, to match on the right price node you can use the following css selector sequence: #container-productdetailPrice [id$=price]
So,
read_html() %>% html_element('#container-productdetailPrice [id$=price]') %>% html_text()
This looks for a parent element with id container-productdetailPrice, then uses a descendant combinator (the " " in the selector list) to look for a child with id ending ($) with price
Further reading: https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | QHarr |
