'How to get hidden price in amazon using python scraper
I want to create program that will scrape amazon product information and create a database. This program I want to create for computer details automatic scraping, but when I started for checking price my program starts work badly. It's only checked product name and rarely product price. My code for scraping.
from selectorlib import Extractor
import requests
import json
from time import sleep
# Create an Extractor by reading from the YAML file
e = Extractor.from_yaml_file('selectors.yml')
def scrape(url):
headers = {
'dnt': '1',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'referer': 'https://www.amazon.com/',
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
}
# Download the page using requests
print("Downloading %s"%url)
r = requests.get(url, headers=headers)
# Simple check to check if page was blocked (Usually 503)
if r.status_code > 400:
if "To discuss automated access to Amazon data please contact" in r.text:
print("Page %s was blocked by Amazon. Please try using better proxies\n"%url)
else:
print("Page %s must have been blocked by Amazon as the status code was %d"%(url,r.status_code))
return None
# Pass the HTML of the page and create
return e.extract(r.text)
# product_data = []
with open("urls.txt",'r') as urllist, open('output.json','w') as outfile:
for url in urllist.read().splitlines():
data = scrape(url)
if data:
json.dump(data,outfile)
outfile.write("\n")
# Formatting file
f = open("output.json", "r")
json_data = f.readlines()
tmp = ''.join(json_data)
tmp = tmp.replace('}\n{', '},\n{') # replace '}{' with '},{'
tmp = '[' + tmp + ']' # add brackets around it
v = open("output.json", "w")
v.write(tmp)
print(tmp) # print the tmp
v.close()
f.close()
selectors.yml
name:
css: '#productTitle'
type: Text
price:
css: '#price_inside_buybox'
type: Text
And for one site my program works. urls.txt
https://www.amazon.com/GIGABYTE-Graphics-WINDFORCE-GV-N3080GAMING-OC-12GD/dp/B09QDWGNPG/
But for other it can give price : null.
https://www.amazon.com/Intel-i7-12700KF-Desktop-Processor-Unlocked/dp/B09FXKHN7M/
How can I parse a hidden price?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
