'Scraping the rating of some reviews as pictures
I am trying to scrape the rating of some movie reviews but the rating is not a number, but it is one of 10 different images that range from showing empty stars to full stars.
This is the website where I scrape the data from: https://www.cinemagia.ro/filme/avatar-17818/reviews/?pagina=1&order_direction=DESC
This is my code:
import requests
from bs4 import BeautifulSoup
url = 'https://www.cinemagia.ro/filme/avatar-17818/reviews/?pagina=1&order_direction=DESC'
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
rating=0
scraped_ratings = soup.find_all('span', class_='stelutze').find=("img")
for i in scraped_ratings:
if "star_full.gif" in i.get("src"):
rating += 1
print(rating)
Somebody helped me with this code but it only gives the rating of the first review.
rating=0
rawRating = soup.find("span", {"class": "stelutze"}).find_all("img")
for i in rawRating:
if "star_full.gif" in i.get("src"):
rating += 1
print(rating)
I tried to change the code to this:
rating=0
count=0
rawRating = soup.find_all("span", {"class": "stelutze"}).find_all("img")
for i in rawRating:
if "star_full.gif" in i.get("src"):
rating += 1
count+= 1
if count == 10:
print(rating)
rating=0
count=0
But I get this error:
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
I think this is because I can't use two find_all in the same statement.
Any help?
Update. Now the code looks like this:
import requests
from bs4 import BeautifulSoup
pageNum = 1
for k in range (1,17):
url = f'https://www.cinemagia.ro/filme/avatar-17818/reviews/?pagina={pageNum}&order_direction=DESC'
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
scraped_movies = soup.find_all('div', class_='left comentariu')
movies = []
for movie in scraped_movies:
movies.append(movie.get_text().strip())
reviewCount = -1
rating = 0
count = 0
rawRatings = soup.find_all("span", {"class": "stelutze"})
for i in rawRatings:
rawRating = i.find_all("img")
for j in rawRating:
if "star_full.gif" in j.get("src"):
rating += 1
count += 1
if count == 10:
reviewCount += 1
print(rating)
print(movies[reviewCount])
rating = 0
count = 0
pageNum += 1
The only problem is: In movies I have all the reviews but not all the reviews have a rating. In RawRatings have all the ratings. I want to print each rating followed by it's respective review but when at some point I encounter a review without a rating I will just give it the rating that is next in line, messing up everything from that point on.
Any idea on how to see if a movie from movies has no rating? So that way I could increment reviewCount by 2 instead of 1.
Solution 1:[1]
I believe that this should solve your issue, I have not tested this but I don't see why it shouldn't work.
Basically, when you do find_all you get a list back of all the elements it finds. So what it is doing is it first gets every review on the page and then you iterate over each review and get all the images for each review like you did before.
rating=0
count=0
rawRatings = soup.find_all("span", {"class": "stelutze"})
for i in rawRatings:
rawRating = i.find_all("img")
for j in rawRating:
if "star_full.gif" in j.get("src"):
rating += 1
count += 1
if count == 10:
print(rating)
rating = 0
count = 0
If you have any questions let me know
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | TheAmazingHAzza |
