'Is it possible to scrape amazon reviews by brand?
In order to do some scalable web scraping on amazon product reviews, I want to find out, if there is a possibility to scrape all product reviews for a particular brand, without knowing all the ASIN and product description information.
Currently I am using a self-made R-scraper. In order to scrape the reviews, I need to collect ASIN and product description from the product page. As I am scraping a lot of products with just 5 to 30 review texts, this takes some time and includes a lot of manual work.
Here is my R-Code (Note, that Primavera is the brand name):
library(pacman)
pacman::p_load(RCurl, XML, dplyr, rvest, purrr)
#### SCRAPE
scrape_amazon <- function(page_num) {
url_reviews <- paste0("https://www.amazon.de/Primavera-Bio-Geschenkset-Zitrusdüfte-17ml/product-reviews/B00F43W2I6/ref=cm_cr_getr_d_paging_btm_next_3?ie=UTF8&reviewerType=all_reviews&pageNumber=", page_num)
doc <- read_html(url_reviews)
map_dfr(doc %>% html_elements("[id^='customer_review']"), ~ data.frame(
review_title = .x %>% html_element(".review-title") %>% html_text2(),
review_text = .x %>% html_element(".review-text-content") %>% html_text2(),
review_star = .x %>% html_element(".review-rating") %>% html_text2(),
date = .x %>% html_element(".review-date") %>% html_text2() %>% gsub(".*vom ", "", .),
author = .x %>% html_element(".a-profile-name") %>% html_text2(),
helpful_votes = .x %>% html_element(".a-size-base.a-color-tertiary.cr-vote-text") %>% html_text2(),
verified = .x %>% html_element(".a-size-mini.a-color-state.a-text-bold") %>% html_text2(),
page = page_num
)) %>%
as_tibble %>%
return()
}
# loop extract
datalist = list()
i = 1
for (i in 1:20) {
df_in <- scrape_amazon(page_num = i)
datalist[[i]] <- df_in
}
output = do.call(rbind, datalist)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
