Category "scrapy"

Nested for loop stop if null value using Scrapy

I use a nested for loop to get data of Weekdays. If one of the days is 'null' the loop stops at that day and doesn't get the rest of the days. I believe that I

How to use scrapy to scrape google play reviews of applications?

I wrote this spider to scrape reviews of apps from google play. I am partially successful in this. I am able to extract the name, date, and review only. My ques

How to do Scrapy historical output comparison using Spidermon

So Scrapinghub is releasing a new feature for Scrapy quality insurance. It says it has historical comparison features where it can detect if the current scrape

Scrapy: No module named 'scrapy.contrib'

I've looked everywhere for a solution to this. I didn't used to have a problem calling "from scrapy.contrib..." but now it throws this error. File "<frozen

Pyinstaller error on scrapy?

I am using scrapy importing it. I built the python file using pyinstaller. After building it I ran the file ./new.py. But the error pops: FileNotFoundError: [

No such file or directory error using pyinstaller and scrapy

I have a python script that uses scrapy and I want to make it into an exe file using pyinstaller. The exe file is generated without any error but when I open it

Auth failing - 999- HTTP status code is not handled or not allowed

I using scrapy, and I would like to get Ignoring response URL.I just see in the output console this: DEBUG: Ignoring response <999 https://www.mywebsite.com

scrapy follow external link with one depth only

Imagine I am crawling foo.com. foo.com has several internal links to itself, and it has some external links like: foo.com/hello foo.com/contact bar.com holla.c

Scrapy: How to output items in a specific json format

I output the scraped data in json format. Default scrapy exporter outputs list of dict in json format. Item type looks like: [{"Product Name":"Product1", "Cate

Get all link text and href in a page using scrapy

class LinkSpider(scrapy.Spider): name = "link" def start_requests(self): urlBasang = "https://bloomberg.com" yield scrapy.Request(url =

Which parse method scrapy used to parse start_urls

I want scrapy to scrape some start urls and then follow the links in those pages according to rules. My spider is inherited from CrawlSpider and has start_urls