'Python Scrapy ValueError(f"No <form> element found in {response}")
I want to scrap data from all pages but after scraping first page it showing an error
The code I wrote is as below:
import scrapy
from scrapy.http import FormRequest
from ..items import PracticeItem
class Practice(scrapy.Spider):
name = 'quotes'
start_urls = ['https://quotes.toscrape.com/login']
def parse(self, response):
token = response.css('form input::attr(value)').extract_first()
return FormRequest.from_response(response, formdata={
'csrf': token,
'username': 'demo',
'password': 'demo'
}, callback=self.start_scraping)
def start_scraping(self, response):
items = PracticeItem()
all_tags = response.css('div.quote')
for x in all_tags:
quote = x.css('span.text::text').extract()
title = x.css('.author::text').extract()
tag = x.css('.tag::text').extract()
items["quote"] = quote
items["title"] = title
items["tag"] = tag
yield items
next_page = response.css('li.next a::attr(href)').get()
if next_page is not None:
yield response.follow(next_page, callback=self.parse)
However I get this:
this is what i am getting after crawling first page.
2022-04-06 00:04:21 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/2/> (referer: http://quotes.toscrape.com/)
2022-04-06 00:04:21 [scrapy.core.scraper] ERROR: Spider error processing <GET http://quotes.toscrape.com/page/2/> (referer: http://quotes.toscrape.com/)
Traceback (most recent call last):
File "f:\bse\data science\python\pythonproject\venv\lib\site-packages\twisted\internet\defer.py", line 857, in _runCallbacks
current.result = callback( # type: ignore[misc]
File "F:\BSE\Data Science\Python\pythonProject\practice\practice\spiders\pra.py", line 16, in parse
return FormRequest.from_response(response, formdata={
File "f:\bse\data science\python\pythonproject\venv\lib\site-packages\scrapy\http\request\form.py", line 64, in from_response
form = _get_form(response, formname, formid, formnumber, formxpath)
File "f:\bse\data science\python\pythonproject\venv\lib\site-packages\scrapy\http\request\form.py", line 104, in _get_form
raise ValueError(f"No <form> element found in {response}")
ValueError: No <form> element found in <200 http://quotes.toscrape.com/page/2/>
2022-04-06 00:04:21 [scrapy.core.engine] INFO: Closing spider (finished)
Solution 1:[1]
On this line:
yield response.follow(next_page, callback=self.parse)
you're telling Scrapy to process NEXT page using self.parse callback (that logins to the site). But you need to process it using self.start_scraping callback instead:
yield response.follow(next_page, callback=self.start_scraping)
Also I think that you need to move items = PracticeItem() inside a for loop...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | gangabass |
