'Scrapy Splash - I am not able to get the value
I am trying to scrape this page: https://simple.ripley.com.pe/laptop-lenovo-ideapad-5-amd-ryzen-7-16gb-ram-256gb-ssd-14-2004286061746p?s=o
All ok, but I am not able to get the values in this xpath:
//*[@id="panel-Especificaciones"]/div/div/table/tbody/tr[19]/td[2]
I think it loads dynamically. It's a table with many rows inside. I would like to get those values.
Image: page section i can't scrape
This is my spider code:
import scrapy
from scrapy_splash import SplashRequest
from numpy import nan
LUA_SCRIPT = """
function main(splash)
splash.private_mode_enabled = false
splash:go(splash.args.url)
splash:wait(2)
html = splash:html()
splash.private_mode_enabled = true
return html
end
"""
class RipleySpider(scrapy.Spider):
name = "ripley"
def start_requests(self):
url = 'https://simple.ripley.com.pe/tecnologia/computacion/laptops?facet%5B%5D=Procesador%3AIntel+Core+i7'
yield SplashRequest(url=url, callback=self.parse)
def parse(self, response):
for link in response.xpath("//div[@class='catalog-container']/div/a/@href"):
yield response.follow(link.get(), callback=self.parse_products)
# for href in response.xpath("//ul[@class='pagination']/li[last()]/a/@href").getall():
# yield SplashRequest(response.urljoin(href), callback=self.parse)
def parse_products(self, response):
titulo = response.css("h1::text").get()
link = response.request.url
sku = response.css(".sku-value::text").get()
precio = response.css(".product-price::text").getall()
if len(precio)==1:
precio_normal = nan
precio_internet = precio[0]
precio_tarjeta_ripley = nan
elif len(precio)==2:
precio_normal = precio[0]
precio_internet = precio[1]
precio_tarjeta_ripley = nan
elif len(precio)==4:
precio_normal = precio[0]
precio_internet = precio[1]
precio_tarjeta_ripley = precio[-1]
try:
# descripcion = response.css(".product-short-description::text").get()
descripcion = response.xpath('//*[@id="panel-Especificaciones"]/div/div/table/tbody/tr[1]/td[2]/text()').get()
except:
descripcion = 'sin valor'
yield {
'Título': titulo,
'Link': link,
'SKU': sku,
'Precio Normal': precio_normal,
'Precio Internet': precio_internet,
'Precio Tarjeta Ripley': precio_tarjeta_ripley,
'Descripción': descripcion,
}
Please, what solutions does scrapy offer? Thanks in advance for your help.
P.D.: I'm using Docker with Splash in localhost:8050. settings.py according to documentation.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
