'scrapy.core.engine DEBUG: Crawled (403) https://publicholidays.com/us/school-holidays/
import scrapy
class UsSpider(scrapy.Spider):
name = 'us_spider'
start_urls = ['https://publicholidays.com/us/school-holidays/']
def parse(self, response):
print(response)
print(response.request.headers)
print("\n")
yield {
"hi": "hello"
}
Solution 1:[1]
The response means the remote server refuses your request. The reason for this might be security concerns of the remote.
Try to send the required request headers like a real user. You can check them by using the DevTools on your browser.
- Open your browser's dev tools.
- Click the Network tab.
- Open the web page that you want to crawl.
- Click on the page's result. Its type will be document.
- Scroll down till you see the Request Headers section.
- Send the same request headers.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | ce7in |