'Have some questions on python regular expressions

Ok, I've got this script:

#!/usr/bin/python3

import requests
from bs4 import BeautifulSoup
import re

def get_html(url):
    r = requests.get(url)
    return r.content

url = "https://foxnews.com/"
html = get_html(url)

pattern = re.compile(r'(https?\/\/).*\.(jpg|jpeg|png)')
matches = re.findall(pattern, html)

for match in matches:
    print(match)

But, I get an error: TypeError: cannot use a string pattern on a bytes-like object

How can I get this so that I use regex to find image links from the HTML I scraped from websites?



Solution 1:[1]

r.content returns the content in bytes, instead you need to get the unicode text to use regex on it.

def get_html(url):
    r = requests.get(url)
    return r.text # instead of r.content

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 monk