'How to create if and catch (exception)for this code of scrapy

so I want to scrape the data of multiple URLs and retrieve all the information. but I can only scrape from 1 URL if more than 1 URL will be an error (list index out of range). and I was given the info that use try and catch. what should be like for the syntax itself?

import scrapy

class QuotesSpider(scrapy.Spider): name = "quotes"

def start_requests(self):
    urls = [
       # 'https://jdih.kaltimprov.go.id/produk_hukum/detail/9ef7f994-9db4'
        
    ]
    for url in urls:
        yield scrapy.Request(url=url, callback=self.parse)

def parse(self, response):
    yield{
        'Kategori':response.xpath('//*[@class="text-left"]/text()')[0].extract(), 
        'Nomor':response.xpath('//*[@class="text-left"]/text()')[1].extract(),
        'Judul':response.xpath('//*[@class="text-left"]/text()')[2].extract().strip(),
        'Tanggal Diterapkan':response.xpath('//*[@class="text-left"]/text()')[3].extract(),
        'Tanggal Diundangkan':response.xpath('//*[@class="text-left"]/text()')[4].extract(),
        'Keterangan Status':response.xpath('//*[@class="text-left"]/p/text()')[0].extract(),
        'Statistik View':response.xpath('//*[@class="text-left"]/text()')[5].extract(),
        'Statistik Download':response.xpath('//*[@class="text-left"]/text()')[6].extract(),
        'Katalog': response.xpath('//*[@class="text-left"]/p/span/text').extract(),
        'Abstraksi' :response.xpath('//*[@class="text-left"]/p/text()')[1].extract(),
        'Lampiran': response.css('body > section > div > div > div > div.row > div.col-3 > a::attr(href)').extract()  
    }


Solution 1:[1]

It's not the problem of scraping multiple urls. It's the problems of your xpath selector. For every element, you give a xpath to select an element from a list. If there is no text to extract and no lists, the "out of range" error will exist.

I have tried your code and add two urls:

class QuestionSpider(scrapy.Spider):
name = 'question'
allowed_domains = ['jdih.kaltimprov.go.id']
start_urls = ['https://jdih.kaltimprov.go.id/produk_hukum/detail/9ef7f994-9db4',
    'https://jdih.kaltimprov.go.id/produk_hukum/detail/5d0c7c0c-aa58']

def parse(self, response):
    yield{
    'Kategori':response.xpath('//*[@class="text-left"]/text()')[0].extract(), 
    'Nomor':response.xpath('//*[@class="text-left"]/text()')[1].extract(),
    'Judul':response.xpath('//*[@class="text-left"]/text()')[2].extract().strip(),
    'Tanggal Diterapkan':response.xpath('//*[@class="text-left"]/text()')[3].extract(),
    'Tanggal Diundangkan':response.xpath('//*[@class="text-left"]/text()')[4].extract(),
    'Keterangan Status':response.xpath('//*[@class="text-left"]/p/text()')[0].extract(),
    'Statistik View':response.xpath('//*[@class="text-left"]/text()')[5].extract(),
    'Statistik Download':response.xpath('//*[@class="text-left"]/text()')[6].extract(),
    'Katalog': response.xpath('//*[@class="text-left"]/p/span/text').extract(),
    'Abstraksi' :response.xpath('//*[@class="text-left"]/p/text()')[1].extract(),
    'Lampiran': response.css('body > section > div > div > div > div.row > div.col-3 > a::attr(href)').extract()  
    }

It gives me an error:

 File "C:\Users\30463\desktop\quetsion3spider\quetsion3spider\spiders\question.py", line 17, in parse
'Keterangan Status':response.xpath('//*[@class="text-left"]/p/text()')[0].extract(),
File "D:\anaconda\lib\site-packages\parsel\selector.py", line 70, in __getitem__
o = super(SelectorList, self).__getitem__(pos)
IndexError: list index out of range

This second line shows the problem of selector. Hope this can help you.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 studymakesmebetter