Category "web-scraping"

Need help parsing link from iframe using BeautifulSoup and Python3

I have this url here, and I'm trying to get the video's source link, but it's located within an iframe. The video url is https://ndisk.cizgifilmlerizle.com... i

Download bing image search results using python (custom url)

I want to download bing search images using python code. Example URL: https://www.bing.com/images/search?q=sketch%2520using%20iphone%2520students My python co

Unable to get correct CSS tag for webscraping in R using SelectorGadget

I am trying to web scrape data in R from this url but cannot seem to get the correct css tag. For now I just need help retrieving the professor's name. Any help

How to accept facebook cookies using python selenium?

I have a problem clicking the facebook accept cookies button on facebook creator studio website. Cookies are shown only when the page is opened by the program,

Proxy: Selenium + Python + Firefox

I'm trying to set up a proxy for Selenium and Firefox (using Python). I have seen many tutorials and posts how to do it, but most are outdated or don't work for

Scrapy returns ValueError SelectorList is not supported

I think the problem is when I try to enter each url spell with response.follow in the loop, but idk why, it passes the around 500 links perfectly to extract_xpa

Scrapy: scraping large PDF files without keeping response body in memory

Let's say I want to scrape a PDF of 1GB with Scrapy, then using the scraped PDF data in further Requests down the line.. how do I do this without keeping the 1G

How do i save the authentication with puppeteer?

I need to talk to a telegram bot, with my web app. So i decided to do a web scrapping, i do not know if its the best strategy. When i try to access the telegram

How can I handle pagination with Scrapy and Splash, if the href of the button is javascript:void(0)

I am trying to scrape the names and links of universities from this website: https://www.topuniversities.com/university-rankings/world-university-rankings/2021,

How can I handle pagination with Scrapy and Splash, if the href of the button is javascript:void(0)

I am trying to scrape the names and links of universities from this website: https://www.topuniversities.com/university-rankings/world-university-rankings/2021,

Setting a default for nosuchelementexception for multiple variables in python

So I am scrapping multiple rows of a table and many of them are either available or not for different pages. What I want to do is to detect which field is not a

Scrape and change data in date in BeautifulSoup

I am scraping data from different web pages and there are several dates in this data. The code allowing me to have the information that I want looks like this,

Can I change a drop down item from a list in Jsoup and submit it?

I have a site I'm trying to scrape with Jsoup that has monthly and yearly selection boxes where the data changes when a different month or year is selected. Edi

Can't bypass cloudflare with python cloudscraper

I faced with cloudflare issue when I tried to parse the website. I got this code import cloudscraper url = "https://author.today" scraper = cloudscraper.create

Not able to replicate AJAX using Python Requests

I am trying to replicate ajax request from a web page (https://droughtmonitor.unl.edu/Data/DataTables.aspx). AJAX is initiated when we select values from dropdo

Not able to replicate AJAX using Python Requests

I am trying to replicate ajax request from a web page (https://droughtmonitor.unl.edu/Data/DataTables.aspx). AJAX is initiated when we select values from dropdo

Unable to iterate through list using BeautifulSoup

I am doing some experiments with Python3.6 in Mac and BeautifulSoup. I am trying to build a simple program to scrap song lyrics from a URL and store them as pla

Pulling company name from webpage within <a> tag

I am trying to streamline my data collection by using Python 3.7 and BeautifulSoup to pull company name, if that company is approved or other, and if they are m

I get InvalidURL: URL can't contain control characters when I try to send a request using urllib

I am trying to get a JSON response from the link used as a parameter to the urllib request. but it gives me an error that it can't contain control characters. h

NFT List Prices

OpenSea allows users to buy and sell NFTs. From OpenSea, you can view the prices of listed NFTs within a project. When an NFT is listed, is the listed price sto