'scraping all the reviews of a IMDB movie in R
I wrote the code to scrap the review and the detailed review for a movie.
But it scrap information that has been already loaded to the page. (Ex: If there are 1000 reviews, the web page only shows the 10 reviews first.The other reviews will display after clicking "Load more")
require(rvest)
require(dplyr)
MOVIE_URL <- read_html("https://www.imdb.com/title/tt0167260/reviews?ref_=tt_urv")
ex_review <- MOVIE_URL %>% html_nodes(".lister-item a") %>%
html_text()
detialed <- MOVIE_URL %>% html_nodes(".content")%>%
html_text()
Is there a way to scrape the information of every review?
Solution 1:[1]
This is similar to a previous question (How to scrape all the movie reviews from IMDB using rvest), though the answer no longer works.
Now when you are looking at a single page of reviews, say (https://www.imdb.com/title/tt0167260/reviews), you can load the next page of reviews reviews via the url:
movieurl = "https://www.imdb.com/title/tt0167260/reviews/_ajax?&paginationKey="+pagination_key
where pagination_key is the data-key hidden in the html under:
<div class="load-more-data" data-key="g4xolermtiqhejcxxxgs753i36t52q343andv6xeade6qp6qwx57ziim2edmxvqz2tftug54" data-ajaxurl="/title/tt0167260/reviews/_ajax">.
So if you retrieve the html from movie_url = "https://www.imdb.com/title/tt0167260/reviews/_ajax?&paginationKey=g4xolermtiqhejcxxxgs753i36t52q343andv6xeade6qp6qwx57ziim2edmxvqz2tftug54" you will get the second page of reviews.
To then access the third page you need to repeat the process i.e. look for the pagination key from this second page and repeat.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | A. Bollans |
