'I need assistance with scrapy config
I need help. I'm trying to setup scrapy spider. What is the purpose of ("/")[-2]? What is the purpose of 'wb'?
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
def start_requests(self):
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
page = response.url.split("/")[-2] ###<<<<<< This.
filename = f'quotes-{page}.html'
with open(filename, 'wb') as f: ###<<<<<< This.
f.write(response.body)
self.log(f'Saved file {filename}')
Solution 1:[1]
The split will turn your URL string into a list where each item is the content between the slashes. The bracket [-2] accesses the last but one item of the list. The 'wb' in the open function means opening a file for writing in bytes mode instead of string mode, where string mode takes care about the encoding with default UTF-8. Bytes mode will not interpret the bytes of the response body.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Carlos Horn |
