'start url head changed after modified scrapy to scrapy-redis
I have a scrapy project and I want to modified it to scrapy-redis: the main scrapy file was below:
class MySpider(RedisSpider):
name = 'ScrapyBot'
redis_key = 'myspider:start_urls'
start_urls = []
my_header = {
"Host": "jd.com",
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0",
}
def start_requests(self):
for url in MySpider.start_urls:
yield scrapy.Request(
url=url,
headers=MySpider.my_header,
callback=self.parse}
)
the request works fine in Scrapy, but after add scrapy-redis part, header in start request(catched from Fidder) changed to default
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en
User-Agent: Scrapy/1.6.0 (+https://scrapy.org)
Accept-Encoding: gzip,deflate
which caused server returns 403 error, how can I fix the header for start urls in scrapy-redis?
Solution 1:[1]
You can set default headers in settings.py file this way:
DEFAULT_REQUEST_HEADERS = {
"Host": "jd.com",
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0",
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Pavel Bely |
