'How to read text off a website using python (Simple explanation)
I'm looking to make a program that can get the text off a website when given the website's URL. I would like to be able to get all text between the
tags. Everywhere I have looked online seems to overcomplicate this and it involves some coding in C which I am not well versed in. To summarize what I would like the code to look like (best case scenario). If theres anything I can clarify or is unclear in the question please let me know in comments
import WebReader as WR
StringOfWebText = WR.getParagrahText("WebsiteURL")
Solution 1:[1]
You probably want to look into something like BeautifulSoup paired with requests. You can then extract text from a page with a simple solution like this:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://google.com")
soup = BeautifulSoup(r.text, "html.parser")
print(s.text)
There's also tag-searching and other useful features built into BS4, if you need to be able to handle that.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Grace |
