'Have some questions on python regular expressions
Ok, I've got this script:
#!/usr/bin/python3
import requests
from bs4 import BeautifulSoup
import re
def get_html(url):
r = requests.get(url)
return r.content
url = "https://foxnews.com/"
html = get_html(url)
pattern = re.compile(r'(https?\/\/).*\.(jpg|jpeg|png)')
matches = re.findall(pattern, html)
for match in matches:
print(match)
But, I get an error: TypeError: cannot use a string pattern on a bytes-like object
How can I get this so that I use regex to find image links from the HTML I scraped from websites?
Solution 1:[1]
r.content returns the content in bytes, instead you need to get the unicode text to use regex on it.
def get_html(url):
r = requests.get(url)
return r.text # instead of r.content
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | monk |
