Category "web-crawler"

Is there a tree-like structure where nodes can appear multiple times and even be ancestors of themselves?

I'm crawling some web pages, recursively getting all the existing links, and I would like to preserve in some kind of structure the history of links I've had to

Python Scrapy Web Scraping : problem with getting URL inside the onclick element which has ajax content

I am beginner for the web scraping with scrapy . I try to scrape user reviews for specific book from goodreads.com . I want to scrape all of the reviews about b

Need the number of total pages on a website to iterate but selenium keeps timing out

i'm triying to fix a data crawler that until last couple of weeks was working perfectly. The script consist of two parts, one that retrieves the links of the ar

Puppeteer not giving accurate HTML code for page with shadow roots

I am trying to download the HTML code for the website intersight.com/help/. But puppeteer is not returning the HTML code with hrefs as we can see in the page (e

Creating a python web scraper to get metadata for google play store apps

I am very new to Python and am really interested in learning more. I have been given a task by a course I am doing currently... Please write a small Python scr

Code unreachable when adding Selenium ChromeOptions

For some reason my Python code displays as unreachable after adding a series of WebDriver options. Does anyone know why this is happening and how it can be fixe

scrapy follow external link with one depth only

Imagine I am crawling foo.com. foo.com has several internal links to itself, and it has some external links like: foo.com/hello foo.com/contact bar.com holla.c

Crawling Twitter API for specific tweets

I am trying to crawl twitter for specific keywords, which I have made into the array keywords = ["art", "railway", "neck"] I am trying to search for these wo

Can one specify a file content-type to download using Wget?

I want to use wget to download files linked from the main page of a website, but I only want to download text/html files. Is it possible to limit wget to text/

Which parse method scrapy used to parse start_urls

I want scrapy to scrape some start urls and then follow the links in those pages according to rules. My spider is inherited from CrawlSpider and has start_urls

How to crawl question and answer of Google People Also Ask with Selenium and Python?

I used this code for crawl question and anwser of Google People Also Ask. I want use that for create idea for writer. But I can't get exactly that element, in t