'Amazon page links with request_HTML
I am scraping some product details. I wrote a program that worked fine on the first day, but on the next day when I ran it two or three times it started to give me an error, when I tried to move on to the next page. It moved to the second page but not to the third.
My program:
from requests_html import HTMLSession
session = HTMLSession()
urls = []
data = []
url = "https://www.amazon.com/s?k=gpu+graphics+card&i=computers-intl-ship&qid=1651764839&sprefix=gpu%2Ccomputers-intl-ship%2C453&ref=sr_pg_1"
def getdata(url):
r = session.get(url)
r.html.render(timeout=100)
# products = r.html.find("div[data-asin]")
#
# for product in products:
# item = {}
# try:
# title = product.find('span.a-size-medium.a-color-base.a-text-normal')[0].text
# except:
# title = ''
# try:
# rating = product.find('span.a-icon-alt')[0].text
# except:
# rating = ''
# try:
# price = product.find('span.a-price-whole')[0].text
# except:
# price = ''
# item['Title'] = title
# item['rating'] = rating
# item['Price'] = price
# data.append(item)
return r
def nextpage(link):
nextpage_menu = link.html.find(".s-pagination-strip", first=True)
if not nextpage_menu.find("span.s-pagination-item.s-pagination-next.s-pagination-disabled"):
all_aTags = nextpage_menu.find("a")
next_url = "https://www.amazon.com/" + str(all_aTags[len(all_aTags) - 1].attrs['href'])
urls.append(next_url)
return next_url
else:
urls.append('')
return
while True:
link = getdata(url)
url = nextpage(link)
if not url:
break
print(url)
print("++++++++++++++++++++++++++++")
print(urls)
print(data)
Error:
https://www.amazon.com//s?k=gpu+graphics+card&i=computers-intl-ship&page=2&qid=1651923364&sprefix=gpu%2Ccomputers-intl-ship%2C453&ref=sr_pg_1 Traceback (most recent call last): File "C:/Users/Muhammad/Documents/HTML_CSS_JAVASCRIPT/Scraping_modules/amazon_links.py", line 50, in url = nextpage(link) File "C:/Users/Muhammad/Documents/HTML_CSS_JAVASCRIPT/Scraping_modules/amazon_links.py", line 38, in nextpage if not nextpage_menu.find("span.s-pagination-item.s-pagination-next.s-pagination-disabled"): AttributeError: 'NoneType' object has no attribute 'find'
You can see that it scrapes the next page link, but it doesn't scrape any further. I checked the element section on Chrome Developer Tools. The .s-pagination-strip class is present there, but it's returning nothing.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
