'gsub() only working after copying vector back from output of dput()
I have the following problem: I scraped prices from multiple webpages.As for some webpages the price is scraped as html_text(), it contains things as currency or ".-" after the price.
Now if I try to remove these things from the price itself using gsub(), it doesn't fully work.
Also if I then try to convert the prices to integer using as.integer(), it gives me just NA's for every price.
The strange thing is that if I use dput()to get the content of the vector shown in the console and then copy this content and save it as a new vector (like vec<-c("5.-","10.-","9.-") it suddenly works and I can properly use gsub() and as.integer().
Does anyone know why this could be happening?
The code I use to scrape the prices is:
input_galaxus2<-paste0('https://www.galaxus.ch/',input_galaxus$`Galaxus Artikel`)
sess <- session(input_galaxus2[1]) #to start the session
for (j in input_galaxus2){
sess <- sess %>% session_jump_to(j) #jump to URL
i=i+1
try(vec_galaxus[i] <- read_html(sess) %>% #can read direct from sess
html_nodes('.sc-algx62-1.cwhzPP') %>%
html_text())
Sys.sleep(runif(1, min=1, max=2))
}
and the j inside the code refers to the product number that can be pasted just after the base url, for example 14513912, 14513929 or 8606656
Edit: so the product links are for example: https://www.galaxus.ch/14513912, https://www.galaxus.ch/14513929 and https://www.galaxus.ch/8606656
Solution 1:[1]
library(tidyverse)
library(rvest)
#>
#> Attaching package: 'rvest'
#> The following object is masked from 'package:readr':
#>
#> guess_encoding
"https://www.galaxus.ch/8606656" %>%
read_html() %>%
html_nodes('.sc-algx62-1.cwhzPP') %>%
html_text() %>%
str_extract("[0-9]+") %>%
as.integer()
#> [1] 385
Created on 2022-03-09 by the reprex package (v2.0.0)
Use as.numeric and [0-9.,] to get the cents, too.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | danlooo |
