Category "web-scraping"

Using R code to scrape data from a webpage into an Excel file

I have written a code in R which is supposed to retrieve certain information from a website and import it into an Excel file. I have used it for one website and

Not getting all the html data in the devtools on zillow website (and other)

I'm trying to scrape real estate data from zillow. When I look the html code on the devtool, most of the links of the house details are not displayed in the htm

Cannot scrape the correct aspect ration of the image - Python

I'm having a problem to extract an image from a "Manga" website using python. Below is the element example on the website: img id="comic" class="loading" onerro

HTTP error 403 in Python 3 web scraping the publications

This is the traceback of the error that is happening when I am trying to put the URL of the publication. It works for the regular websites such as Stack Overflo

How to scrape wikipedia text from <p> without id or class?

I am scraping a Wikipedia text but the <p> does not have any class or id: import requests as r from bs4 import BeautifulSoup as bs url=r.get("https://en.

How to use scrapy to scrape google play reviews of applications?

I wrote this spider to scrape reviews of apps from google play. I am partially successful in this. I am able to extract the name, date, and review only. My ques

How to do Scrapy historical output comparison using Spidermon

So Scrapinghub is releasing a new feature for Scrapy quality insurance. It says it has historical comparison features where it can detect if the current scrape

removing `\n` using bs4 get_text()

from bs4 import BeautifulSoup # current output as below """ 'DOMINGUEZ, JONATHAN D. VS. RAMOS,\n SILVIA M' """ # d

Trouble modifying the language option in selenium python bindings

I've created a script in python in combination with selenium to scrape different app names from google play store and they all are coming through when I execute

Can't grab coordinates from ArcGIS iframe in a webpage using requests

I've created a script to get coordinates (-119.412 49.023 in this case) from a map located in a webpage using requests module. When I try using my script below

how to use same cookies over multiple requests when using python requests

I am new to python requests and am using it to scrape a website and get to a certain webpage, first I login and then I do a few requests for other webpages: im

OSError: [Errno 22} Invalid argument: 'downloaded/misc/jquery.js?v=1.4.4'

tfp = open(filename, 'wb') OSError: [Errno 22} Invalid argument: 'downloaded/misc/jquery.js?v=1.4.4' Can anyone help me with this error? I figure it has somet

Scraping content from urls in dataframe using R

Sorry, I'm relatively new to R and don't know it very well yet. I have also seen that similar questions have been asked more often. However, the corresponding s

Why can't I scrape table data in order?

I'm trying to scrape table data off of this website: https://www.nfl.com/standings/league/2019/REG I have working code (below), however, it seems like the table

Python - BeautifulSoup - How to return two different elements or more, with different attributes?

HTML Exemple <html> <div book="blue" return="abc"> <h4 class="link">www.example.com</h4> <p class="author">RODRIGO</p> </

Python get string from an html page

I have to create an array which contains all the element within title="", for example: title="xxxxx", title="xxx2", title='xxx4', etc... I need to get xxxx,

How can I download images on a page using puppeteer?

I'm new to web scraping and want to download all images on a webpage using puppeteer: const puppeteer = require('puppeteer'); let scrape = async () => {

Can't manipulate dataframe in pandas

Don't understand why I can't do even the most simple data manipulation with this data i've scraped. I've tried all sorts of methjods to manipulate the data but

soup.find() function is not working, how do I find the ID value?

If I have the following HTML that was found with BeautifulSoup, can someone explain why print(soup.find(id="style")) or print(soup.find(id="id")) does not work

How to scrape all data from first page to last page using beautifulsoup

I have been trying to scrape all data from the first page to the last page, but it returns only the first page as the output. How can I solve this? Below is my