Category "beautifulsoup"

UnicodeDecodeError 'utf-8' codec can't decode byte 0x92 in position 2893: invalid start byte

I'm trying to open a series of HTML files in order to get the text from the body of those files using BeautifulSoup. I have about 435 files that I wanted to run

Scraping Wikipedia for information with Beautiful Soup

I managed to scrape wikipedia for names of US Presidents using Beautiful Soup. After which I converted them into dataframe. names=[all the president's name] wik

Extract business hours from Google using only beautiful soup

Goal Extract the business hours and its closed status from the Google Search results. Screenshot with the highlighted working hours and closed status (example U

how to use re.sub to replace matches with a series of numbers

I'm trying to remove all HTML tags from a text file and after some processing on the text , I have to put the HTML tags back in the text, So i thought maybe rep

BeautifulSoup: How to find all href links in a div with a class?

On disboard.org/ I am trying to collect all href's within a div with a class of 'server-name'. Source-Code: def scrape(): url = 'https://disboard.org/search

Need help parsing link from iframe using BeautifulSoup and Python3

I have this url here, and I'm trying to get the video's source link, but it's located within an iframe. The video url is https://ndisk.cizgifilmlerizle.com... i

Download bing image search results using python (custom url)

I want to download bing search images using python code. Example URL: https://www.bing.com/images/search?q=sketch%2520using%20iphone%2520students My python co

How to speed up python data parsing?

I have such a task - i need to parse the site in the form of a taxonomy and save to csv, that is, upload 24,000 links, that is, I uploaded 800 links to a file,

Scrape and change data in date in BeautifulSoup

I am scraping data from different web pages and there are several dates in this data. The code allowing me to have the information that I want looks like this,

Unable to iterate through list using BeautifulSoup

I am doing some experiments with Python3.6 in Mac and BeautifulSoup. I am trying to build a simple program to scrap song lyrics from a URL and store them as pla

Pulling company name from webpage within <a> tag

I am trying to streamline my data collection by using Python 3.7 and BeautifulSoup to pull company name, if that company is approved or other, and if they are m

Extracting text from PDF url file with Python

I want to extract text from PDF file thats on one website. The website contains link to PDF doc, but when I click on that link it automaticaly downloads that fi

I get InvalidURL: URL can't contain control characters when I try to send a request using urllib

I am trying to get a JSON response from the link used as a parameter to the urllib request. but it gives me an error that it can't contain control characters. h

ImportError: cannot import name 'CharsetMetaAttributeValue'

from bs4 import BeautifulSoup html_doc=''' html_doc = """ <html><head><title>The Dormouse's story</title></head> <body> <

Add quote to every item in a Python List

I have the following Python list from BeautifulSoup (for example): [Basketball, Ipad Pro, Macbook Pro, Racket] I need to add quote to every item in the list,

How to fix Deprecation Warning: executable_path has been deprecated, please pass in a Service object

I am pretty new to coding and Python - The scraper starts off well and works, until at some point (after around 1 minute or so) it stops and hands out this erro

How to specify needed fields using Beautiful Soup and properly call upon website elements using HTML tags

I have been trying to create a web scraping program that will return the values of the Title, Company, and Location from job cards on Indeed. I finally am not r

Extract everything inside tag, but not tag itself

I'm using BeautifulSoup to scrape text from a website, but I only want the <p> tags for organization. However, I can't use text.findAll('p'), because the

Download a captcha image without an extension

How I can download this captcha image with PIL or another image manipulation library, I tried several ways but I can't download the image. from PIL import Imag

How would I go about incorporating an if statement in item list?

I need to find the phone numbers in this website, I have come to the conclusion that I need to write an If statement but I'm not really sure how to do that sinc