'Issue with python web scraping when scraping price from a product
So I have this code. I successfully extract each product name of the page.
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
page_url = "https://www.tenniswarehouse-europe.com/catpage-WILSONRACS-EN.html"
uClient = uReq(page_url)
page_soup = soup(uClient.read(), "html.parser")
uClient.close()
containers = page_soup.findAll("div", {"class":"product_wrapper cf rac"})
for container in containers:
name = container.div.img["alt"]
print(name)
And Im trying to extract the prices from the html below. I tried the same approach as above but faced an error saying index out of range. I also tried to div where the price is and even the span but to no avail.
<div class="product_wrapper cf rac">
<div class="image_wrap">
<a href="https://www.tenniswarehouse-europe.com/Wilson_Pro_Staff_RF_97_V130_Racket/descpageRCWILSON-97V13R-EN.html">
<img class="cell_rac_img" src="https://img.tenniswarehouse-europe.com/cache/56/97V13R-thumb.jpg" srcset="https://img.tenniswarehouse-europe.com/cache/112/97V13R-thumb.jpg 2x" alt="Wilson Pro Staff RF 97 V13.0 Racket" />
</a>
</div>
<div class="text_wrap">
<a class="name " href="https://www.tenniswarehouse-europe.com/Wilson_Pro_Staff_RF_97_V130_Racket/descpageRCWILSON-97V13R-EN.html">Wilson Pro Staff RF 97 V13.0 Racket</a>
<div class="pricing">
<span class="price"><span class="convert_price">264,89 €</span></span>
<span class="msrp">SRP <span class="convert_price">300,00 €</span></span>
</div>
<div class="pricebreaks">
<span class="pricebreak">Price for 2: <span class="convert_price">242,90 €</span> each</span>
</div>
<div>
<p>Wilson updates the cosmetic of Federer's RF97 but keeps the perfect spec profile and sublime feel that has come to define this iconic racket. Headsize: 626cm². String Pattern: 16x19. Standard Length</p>
<div class="cf">
<div class="feature_links cf">
<a class="review ga_event" href="/Reviews/97V13R/97V13Rreview.html" data-trackcategory="Product Info" data-trackaction="TWE Product Review" data-tracklabel="97V13R - Wilson Pro Staff RF 97 V13.0 Racket">TW Reviews</a>
<a class="feedback ga_event" href="/feedback.html?pcode=97V13R" data-trackcategory="Product Info" data-trackaction="TWE Customer Review" data-tracklabel="97V13R - productName">Customer Reviews</a>
<a class="video_popup ga_event" href="/productvideo.html?pcode=97V13R" data-trackcategory="Video" data-trackaction="Cat - Product Review" data-tracklabel="Wilson_Pro_Staff_RF_97_V130_Racket">Video</a>
</div>
</div>
</div>
</div>
</div>
</td>
<td class="cat_border_cell">
<div class="product_wrapper cf rac">
Solution 1:[1]
I guess this will work for you:
prices = page_soup.findAll("span", {"class":"convert_price"})
Then you'll have a container with all prices on the page, you can access single prices with prices[0] ... prices[len(prices)-1].
If you want to remove the html tags from the prices do prices[0].text
But where is this HTML exactly from? Bc the prices aren't on the page of the link you souped in your code. So in this soup you shouldn't find any prices.
The above code works for the html code you provided there.
Edit: Screenshot for the comment below

!SOLUTION!:
A way to solve this issue is by using Selenium webdriver together with BeautifulSoup. I can't seem to find any other (easier) way.
First, install Selenium with pip install selenium
Download the driver for your browser here.
What we do is we click the "Set Selections" button which appears when opening the website, then we soup the page with the prices already loaded in. Enjoy my code below.
from bs4 import BeautifulSoup
from selenium import webdriver
# use the path of your driver.exe
driver = webdriver.Firefox(executable_path="C:\Program Files (x86)\geckodriver.exe")
# for Chrome it's: driver = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe")
# open your website link
driver.get("https://www.tenniswarehouse-europe.com/catpage-WILSONRACS-EN.html")
# button for submitting the location
button1 = driver.find_element_by_class_name("vat_entry_opt-submit")
button1.click()
# now that the button is clicked the prices are loaded in and we can soup this page
html = driver.page_source
page_soup = BeautifulSoup(html)
# extracting all prices into an array named pricing
pricing = page_soup.findAll("div",{"class":"pricing"})
price = pricing[x].span.text
# a loop for writing every price inside an array named 'price'
price = []
i = 0
while i<len(pricing):
price.append(pricing[i].span.text)
i = i + 1
# For this example you have to use class "pricing" instead of "price" because the red prices are in class "sale"
# replace x with the price you're looking for, or let it loop and get all prices in one array
# driver.close() closes your webdriver window
Solution 2:[2]
Would this help:
In [244]: soup = BeautifulSoup(requests.get('https://www.tenniswarehouse-europe.com/Wilson_Pro_Staff_RF_97_V130_Racket/descpageRCWILSON-97V13R-EN.html').content, 'html.parser')
In [245]: soup.find('h1', class_='name').text.strip()
Out[245]: 'Wilson Pro Staff RF 97 V13.0 Racket'
In [246]: soup.find(class_='convert_price').text.strip()
Out[246]: '242,90 €'
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | idar |
