'Scrapy - change settings based on value of scraped item durin runtime

I need to change the FEEDS parameters for the export of csv's to AWS S3 depending on the value of scraped items. I tried to put a condition in settings.py but it doesn't seem to work as I am not able to import item in the settings.py ( I get "cannot import name 'item' from ...") . Tried fom pipelines and spider

if item.get('meta_source') is not None:
    FEEDS = {
    's3://ghr-crawler-ops/crawler_holding/meta.csv': {
                                        'format': 'csv'}
    }
else:
    FEEDS = {
    's3://ghr-crawler-ops/crawler_holding/results.csv': {
                                        'format': 'csv'}
    }

Basically I need to export two csv's to AWS S3 from the same spider depending on the value of the scraped data. It works fine exporting it on my local computer but not to S3 (all the data gets exported in one csv)



Solution 1:[1]

This should be achievable with Item filtering, available since Scrapy 2.6. Something like the following:

from scrapy.extensions.feedexport import ItemFilter


class MetaItemFilter(ItemFilter):
    def accepts(self, item) -> bool:
        return item.get("meta_source") is not None


class ResultsItemFilter(ItemFilter):
    def accepts(self, item) -> bool:
        return item.get("meta_source") is None


FEEDS = {
    "s3://ghr-crawler-ops/crawler_holding/meta.csv": {
        "format": "csv",
        "item_filter": MetaItemFilter,
    },
    "s3://ghr-crawler-ops/crawler_holding/results.csv": {
        "format": "csv",
        "item_filter": ResultsItemFilter,
    }
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 elacuesta