'python scrapy need assitance. I want to save to a (.csv) file. How can I do this?

I'm using debian Bullseye (11.2) I want to save to a (.csv) file. How can I do this?

from scrapy.spiders import CSVFeedSpider


class CsSpiderSpider(CSVFeedSpider):
    name = 'cs_spider'
    allowed_domains = ['ocw.mit.edu/courses/electrical-engineering-and-computer-science/']
    start_urls = ['http://ocw.mit.edu/courses/electrical-engineering-and-computer-science//feed.csv']
    # headers = ['id', 'name', 'description', 'image_link']
    # delimiter = '\t'

    # Do any adaptations you need here
    #def adapt_response(self, response):
    #    return response

    def parse_row(self, response, row):
        i = {}
        #i['url'] = row['url']
        #i['name'] = row['name']
        #i['description'] = row['description']
        return i


Solution 1:[1]

Here's an example of using the FEEDS export from scrapy.

import scrapy
from scrapy.crawler import CrawlerProcess


class CsspiderSpider(scrapy.Spider):
    name = 'cs_spider' 
    start_urls = ['http://ocw.mit.edu/courses/electrical-engineering-and-computer-science']

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url=url, callback = self.parse_row
            )

    def parse_row(self, response):
        yield {
            'test':response.text
        }

process = CrawlerProcess(
    settings = {
        'FEEDS':{
            'data.csv':{
                'format':'csv'
            }
        }
    }
)
process.crawl(CsspiderSpider)
process.start()

Will save the output of your file into .csv format. Furthermore, To specify columns to export and their order use FEED_EXPORT_FIELDS. You can read more about this in the docs

In the command line you can run:

scrapy crawl cs_spider -o output.csv

However, when running the above in the command line make sure to comment out all the code from process and below.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 dollar bill