'Python - Scrapping Woocommerce does not bring text from price

i am working in a price update control between the web from my work and the Tango database (our management/administration system).

Because of that, i have to scrap prices from our web site iwth Python. But i am having troubles while scraping woocommerce price text. I tried to scrape with requests html and with BeautifulSoup libraries but both brings (direct from source) the "bdi" price text as $0.00:

For example: https://hierroscasanova.com.ar/producto/cano-estructural-redondo/?attribute_pa_medida-1=3&attribute_pa_espesor=2-85&attribute_pa_unidad=kg

Script de requests_html:

from requests_html import HTMLSession
import csv
import time

link = 'https://hierroscasanova.com.ar/producto/cano-estructural-redondo/?attribute_pa_medida-1=3&attribute_pa_espesor=2-85&attribute_pa_unidad=kg'

s = HTMLSession()
r = s.get(link)
#print(r.text)

title = r.html.find('h1', first=True).full_text
price = r.html.find('span.woocommerce-Price-amount.amount bdi')[0].full_text
print(price)
price = r.html.find('span.woocommerce-Price-amount.amount bdi')[1].full_text
print(price)

Result:

$0.00
$0.00

Script de BeautifulSoup:

    import requests
from bs4 import BeautifulSoup

page = requests.get("https://hierroscasanova.com.ar/producto/cano-estructural-redondo/?attribute_pa_medida-1=3&attribute_pa_espesor=2-85&attribute_pa_unidad=kg")
soup = BeautifulSoup(page.text, "html.parser")

print(soup)

Result:

<span class="woocommerce-Price-amount amount"><bdi><span class="woocommerce-Price-currencySymbol">$</span>0.00</bdi>

PS: i noticed that when the full web site is download it brings all the data and prices (not $0.00), so i do not know why are the libraries failling.

    <div class="woocommerce-variation-price"><span class="price"><span class="woocommerce-Price-amount amount"><bdi><span class="woocommerce-Price-currencySymbol">$</span>325.54</bdi></span> <small class="woocommerce-price-suffix">( IVA incluido )</small></span></div>

Thanks you very much!



Solution 1:[1]

You can do it with Selenium. But i show you how to do it with json and bs4. First we need product id:

def get_id(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, features='lxml')
    data_product_id = soup.find('form', class_='variations_form').get('data-product_id')
    return data_product_id

Then with this ID, we can get price:

def get_price(product_id, payload):
    url = "https://hierroscasanova.com.ar/?wc-ajax=get_variation"
    payload = f"{payload}&product_id={product_id}"
    headers = {
      'accept': '*/*',
      'content-type': 'application/x-www-form-urlencoded; charset=UTF-8'
    }
    response = requests.request("POST", url, headers=headers, data=payload)
    json_data = json.loads(response.text)
    return json_data['display_price']

Now remains to prepare the parameters for the link, and we can check:

attribute_pa_medida = '1=3'
attribute_pa_espesor = '2-85'
attribute_pa_unidad = 'kg'
attributes = f'attribute_pa_medida-{attribute_pa_medida}&attribute_pa_espesor={attribute_pa_espesor}&attribute_pa_unidad={attribute_pa_unidad}'
url = f'https://hierroscasanova.com.ar/producto/cano-estructural-redondo/?{attributes}'
print(get_price(get_id(url), attributes))

UPD full code:

import requests
import json
from bs4 import BeautifulSoup


def get_id(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, features='lxml')
    data_product_id = soup.find('form', class_='variations_form').get('data-product_id')
    return data_product_id


def get_price(product_id, payload):
    url = "https://hierroscasanova.com.ar/?wc-ajax=get_variation"
    payload = f"{payload}&product_id={product_id}"
    headers = {
      'accept': '*/*',
      'content-type': 'application/x-www-form-urlencoded; charset=UTF-8'
    }
    response = requests.request("POST", url, headers=headers, data=payload)
    json_data = json.loads(response.text)
    return json_data['display_price']


attribute_pa_medida = '1=3'
attribute_pa_espesor = '2-85'
attribute_pa_unidad = 'kg'
attributes = f'attribute_pa_medida-{attribute_pa_medida}&attribute_pa_espesor={attribute_pa_espesor}&attribute_pa_unidad={attribute_pa_unidad}'
url = f'https://hierroscasanova.com.ar/producto/cano-estructural-redondo/?{attributes}'
print(get_price(get_id(url), attributes))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Sergey K