Category "web-crawler"

Scraping multiple sites in one scrapy-spider

I am scraping 6 sites in 6 different spiders. But now, I have to scrape these sites in one single spider. Is there a way of scraping multiple links in the same

crawl website with requests and problem is Object.setPrototypeOf

i'm trying to crawl website with python requests GET and having problem with Object.setPrototypeOf i added userAgent in header but it still gives me under code

Web Scraping Google Scholar Author profiles

I have used scholarly package and parsed the author names generated in the 3 question its method search by author name to get the author profiles including all

google_play_scraper cannot crawl all reviews

AS I followed codes of reviews_all from https://github.com/JoMingyu/google-play-scraper I sitll cannot get all reviews, just only a few and not even sorted by d

get contents of a webpage

I want to gather some data from a website that uses some technologies that I don't know, for example from this url. So my problem is that I cannot use methods l

StopIteration Error while using scholarly.pprint function

I am trying to extract Google Scholar public profiles of certain professors. I have a list of professors' names and I am using it with help of a scholarly packa

Is there a tree-like structure where nodes can appear multiple times and even be ancestors of themselves?

I'm crawling some web pages, recursively getting all the existing links, and I would like to preserve in some kind of structure the history of links I've had to

Python Scrapy Web Scraping : problem with getting URL inside the onclick element which has ajax content

I am beginner for the web scraping with scrapy . I try to scrape user reviews for specific book from goodreads.com . I want to scrape all of the reviews about b

Need the number of total pages on a website to iterate but selenium keeps timing out

i'm triying to fix a data crawler that until last couple of weeks was working perfectly. The script consist of two parts, one that retrieves the links of the ar

Puppeteer not giving accurate HTML code for page with shadow roots

I am trying to download the HTML code for the website intersight.com/help/. But puppeteer is not returning the HTML code with hrefs as we can see in the page (e

Creating a python web scraper to get metadata for google play store apps

I am very new to Python and am really interested in learning more. I have been given a task by a course I am doing currently... Please write a small Python scr

Code unreachable when adding Selenium ChromeOptions

For some reason my Python code displays as unreachable after adding a series of WebDriver options. Does anyone know why this is happening and how it can be fixe

scrapy follow external link with one depth only

Imagine I am crawling foo.com. foo.com has several internal links to itself, and it has some external links like: foo.com/hello foo.com/contact bar.com holla.c

Crawling Twitter API for specific tweets

I am trying to crawl twitter for specific keywords, which I have made into the array keywords = ["art", "railway", "neck"] I am trying to search for these wo

Can one specify a file content-type to download using Wget?

I want to use wget to download files linked from the main page of a website, but I only want to download text/html files. Is it possible to limit wget to text/

Which parse method scrapy used to parse start_urls

I want scrapy to scrape some start urls and then follow the links in those pages according to rules. My spider is inherited from CrawlSpider and has start_urls

How to crawl question and answer of Google People Also Ask with Selenium and Python?

I used this code for crawl question and anwser of Google People Also Ask. I want use that for create idea for writer. But I can't get exactly that element, in t

Category "web-crawler"

Other Categories