'Scrapy and crochet
I know this question has already been discussed, however, I could not find a working solution yet.
I am working on a Django + Celery + Scrapy project, where tasks to scrape are issued by users.
Initially I had a task similar to this:
@app.task(bind=True)
def run_task(self):
...
process = CrawlerProcess(crawler_settings)
process.crawl('quotes')
process.start() # the script will block here until the crawling is finished
Where crawler_settings had one setting that saved the execution output to a log.
This worked for one execution, however calling this task multiple times would cause a Reactor not restartable error. This happens because process.start() starts the Twisted reactor, and calling it multiple times causes an error (because it is already started).
Afterwards I tried starting the reactor with process.start(stop_after_crawl=False) to prevent killing the reactor. However, this caused the celery task to be kept running even after the crawling had finished.
I started searching for ways to reuse the created Twisted reactor and found crochet.
Then I created a function using the crochet's wait_for decorator as suggested in their docs:
@wait_for(timeout=200)
def run_spider(spider_name, crawler_settings):
runner = CrawlerRunner(crawler_settings)
deferred = runner.crawl(spider_name)
return deferred
And used it in my celery task:
@app.task(bind=True)
def run_task(self):
...
run_spider('quotes', crawler_settings)
Unfortunately, this isn't working. Not only no log file is being written, but also after the timeout a TimeoutError is raised, which essentially tells me that the task didn't even run (it took about 15 seconds using the 'regular' approach).
Can a kind soul help me? Thanks!
PS: in the explanation, I am using scrapy's introductory spider ('quotes'), which isn't exactly what I am using in my project, however, I have tried using the 'quotes' example and the result is the same, therefore I hope it can help to reproduce this problem.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
