Category "web-scraping"

Why I'm getting "UnicodeEncodeError: 'charmap' codec can't encode character '\u25b2' in position 84811: character maps to <undefined>" error?

I'm getting UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' in position 756: character maps to error while running this code:: from bs4 imp

How to use selenium for webscraping google flights?

I'm trying to pull the airline names and prices of a specific flight. I'm having trouble with the x.path and/or using the right html tags because when I run the

Exception has occured: WebDriverException // Session deleted because of page crash

while I was able to get help on another issue with a python-based web scraper, another issue comes up when I run the code. Page crash Can someone tell me why it

google_play_scraper cannot crawl all reviews

AS I followed codes of reviews_all from https://github.com/JoMingyu/google-play-scraper I sitll cannot get all reviews, just only a few and not even sorted by d

How to ignore infobox when scraping title from Wikipedia anchor text?

I am trying to scrape the first 20 links on a Wikipedia page but I want to ignore the infobox on the right side. It has a 'table' tag. Here is what I have so fa

Webscraping Google Search Results Using Google API - Returns same result over and over again

My problem Hi everyone I am attempting to develop my very first web scraper using the Google API and Beautiful Soup in Python. The aim is for the scraper to

I disabled loading images in chrome while using webdriver with selenium now cant enable it

I disabled loading images in chrome while using webdriver with selenium now cant enable it. I was using python to webscrape on instagram so thought it would be

Is Scrapy Asychronous by Default?

I recently ran a spider in my project but I feel like scrapy it is waiting until one page is finished to move on the other one. if I am correct in scrapy's natu

How to open a new tab using Python Playwright by feeding it a list of URLs?

According to the Playwright documentation, the way to open a new tab in the browser is as shown in the scrap_post_info() function? However, it failed to do so.

Deploy Scrapy Project with Streamlit

I have a scrapy spider that scrapes products information from amazon based on the product link. I want to deploy this project with streamlit and take the produc

get contents of a webpage

I want to gather some data from a website that uses some technologies that I don't know, for example from this url. So my problem is that I cannot use methods l

Scraping google play reviews

I am new to programming and I have recently tried to scrape google play reviews with python using the following program: from bs4 import BeautifulSoup import u

Puppeteer, awaiting a selector, and returning data from within

I am loading a page, intercepting its requests, and when a certain element shows up I stop loading and extract the data I need... Here is the problem that I am

Puppeteer, awaiting a selector, and returning data from within

I am loading a page, intercepting its requests, and when a certain element shows up I stop loading and extract the data I need... Here is the problem that I am

Webscraping returning character(empty)

I have the following code: link = "https://www.funda.nl/en/koop/maastricht/" page = read_html(link) name <- page %>% html_nodes(".search-result__header-t

Webscraping returning character(empty)

I have the following code: link = "https://www.funda.nl/en/koop/maastricht/" page = read_html(link) name <- page %>% html_nodes(".search-result__header-t

Pandas' read_html not reading html tables

I am trying to see if I can use, and only use, Pandas' read_html function to scrape HTML tables from the following website: https://www.baseball-reference.com/t

StopIteration Error while using scholarly.pprint function

I am trying to extract Google Scholar public profiles of certain professors. I have a list of professors' names and I am using it with help of a scholarly packa

IMPORTHTML not working for retrieving the table that shows the player rankings [duplicate]

I have used importhtml function for google sheets many times and successfully but sometimes I have had no luck in getting it to work. I am doi

How to scrape a data from a dynamic website containing Javascript using Python?

I am trying to scrape data from https://www.doordash.com/food-delivery/chicago-il-restaurants/ The idea is to scrape all the data regarding the different resta