'Is it possible to scrape amazon reviews by brand?

In order to do some scalable web scraping on amazon product reviews, I want to find out, if there is a possibility to scrape all product reviews for a particular brand, without knowing all the ASIN and product description information.

Currently I am using a self-made R-scraper. In order to scrape the reviews, I need to collect ASIN and product description from the product page. As I am scraping a lot of products with just 5 to 30 review texts, this takes some time and includes a lot of manual work.

Here is my R-Code (Note, that Primavera is the brand name):

library(pacman)
pacman::p_load(RCurl, XML, dplyr, rvest, purrr)


#### SCRAPE

scrape_amazon <- function(page_num) {
  url_reviews <- paste0("https://www.amazon.de/Primavera-Bio-Geschenkset-Zitrusdüfte-17ml/product-reviews/B00F43W2I6/ref=cm_cr_getr_d_paging_btm_next_3?ie=UTF8&reviewerType=all_reviews&pageNumber=", page_num)
  doc <- read_html(url_reviews)
  
  map_dfr(doc %>% html_elements("[id^='customer_review']"), ~ data.frame(
    review_title = .x %>% html_element(".review-title") %>% html_text2(),
    review_text = .x %>% html_element(".review-text-content") %>% html_text2(),
    review_star = .x %>% html_element(".review-rating") %>% html_text2(),
    date = .x %>% html_element(".review-date") %>% html_text2() %>% gsub(".*vom ", "", .),
    author = .x %>% html_element(".a-profile-name") %>% html_text2(),
    helpful_votes = .x %>% html_element(".a-size-base.a-color-tertiary.cr-vote-text") %>% html_text2(),
    verified = .x %>% html_element(".a-size-mini.a-color-state.a-text-bold") %>% html_text2(),
    page = page_num
  )) %>%
    as_tibble %>%
    return()
}

# loop extract
datalist = list()
i = 1
for (i in 1:20) {
  df_in <- scrape_amazon(page_num = i) 
  datalist[[i]] <- df_in 
}
output = do.call(rbind, datalist)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source