'Pulling all yelp reviews via beautifulsoup

I need some help in pulling all reviews for a hotel using beautiful soup; this is what i have thus far, but i need some inspiration pulling all the reviews via API or regular.

import time
import random
from bs4 import BeautifulSoup as bs

import urllib.request as url


html = urllib.request.urlopen('https://www.yelp.com/biz/shore-cliff-hotel-pismo-beach-2').read().decode('utf-8')

soup = bs(html, 'html.parser')

relevant= soup.find_all('p', class_='comment__09f24__gu0rG css-qgunke')

reviews = []

for div in relevant:
        for html_class in div.find_all('span',class_="raw__09f24__T4Ezm"):
            text = html_class.find('span')
            review = html_class.getText(
            reviews.append(review)
enter code here


Solution 1:[1]

This does the job,

base_url = "https://www.yelp.com/biz/capri-laguna-laguna-beach"
new_page = "?start={}"

content = requests.get(url).content
soup = BeautifulSoup(content, "html.parser")

reviews = []

for i in range(0, 501, 10):
  new_page_url = url + new_page.format(i)
  
  new_content = requests.get(url).content
  new_soup = BeautifulSoup(content, "html.parser")

  relevant= new_soup.find_all('p', class_='comment__09f24__gu0rG css-qgunke')

  for div in relevant:
    for html_class in div.find_all('span',class_="raw__09f24__T4Ezm"):
      text = html_class.find('span')
      review = html_class.getText()
      reviews.append(review)

Code explaination -

If you click to go to the 2nd page you'll see that ?start=10 get's add to the base URL https://www.yelp.com/biz/capri-laguna-laguna-beach. If you go to the 3rd page then you'll see ?start=20 and so on. The number here is the index of the review, and each page has 10 of them. There are 51 total pages meaning the first review on the 51st page would have the index 501. So the added part to the URL would be ?start=500.

So for each page on the website, the code creates a new URL, gets the HTML content of that URL, creates a soup for it and fetches the review from this newly created soup.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Zero