'scrapy.core.engine DEBUG: Crawled (403) https://publicholidays.com/us/school-holidays/

import scrapy

class UsSpider(scrapy.Spider):
    name = 'us_spider'
    start_urls = ['https://publicholidays.com/us/school-holidays/']

    def parse(self, response):
        print(response)
        print(response.request.headers)
        print("\n")

        yield {
            "hi": "hello"
        }

enter image description here

enter image description here

enter image description here enter image description here



Solution 1:[1]

The response means the remote server refuses your request. The reason for this might be security concerns of the remote.

Try to send the required request headers like a real user. You can check them by using the DevTools on your browser.

  1. Open your browser's dev tools.
  2. Click the Network tab.
  3. Open the web page that you want to crawl.
  4. Click on the page's result. Its type will be document.
  5. Scroll down till you see the Request Headers section.
  6. Send the same request headers.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ce7in