'Download bing image search results using python (custom url)

I want to download bing search images using python code.

Example URL: https://www.bing.com/images/search?q=sketch%2520using%20iphone%2520students

My python code generates an url of bing search as shown in example. Next step, is to download all images shown in that link on my local desktop.

In my project i am generating some words in python and my code generates bing image search URL. All i need is to download images shown on that search page using python.



Solution 1:[1]

To download an image, you need to make a request to the image URL that ends with .png, .jpg etc.

But Bing provides a "m" attribute inside the <a> element that stores needed data in the JSON format from which you can parse the image URL that is stored in the "murl" key and download it afterward.

image

To download all images locally to your computer, you can use 2 methods:

# bs4

for index, url in enumerate(soup.select(".iusc"), start=1):
    img_url = json.loads(url["m"])["murl"]
    image = requests.get(img_url, headers=headers, timeout=30)
    query = query.lower().replace(" ", "_")
    
    if image.status_code == 200:
        with open(f"images/{query}_image_{index}.jpg", 'wb') as file:
            file.write(image.content)
# urllib

for index, url in enumerate(soup.select(".iusc"), start=1):
    img_url = json.loads(url["m"])["murl"]
    query = query.lower().replace(" ", "_")

    opener = req.build_opener()
    opener.addheaders=[("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36")]
    req.install_opener(opener)
    req.urlretrieve(img_url, f"images/{query}_image_{index}.jpg")

In the first case, you can use context manager with open() to load the image locally. In the second case, you can use urllib.request.urlretrieve method of the urllib.request library.

Also, make sure you're using request headers user-agent to act as a "real" user visit. Because default requests user-agent is python-requests and websites understand that it's most likely a script that sends a request. Check what's your user-agent.

Note: An error might occur with the urllib.request.urlretrieve method where some of the request has got a captcha or something else that returns an unsuccessful status code. The biggest problem is it's hard to test for response code while requests provide a status_code method to test it.


Code and full example in online IDE:

from bs4 import BeautifulSoup
import requests, lxml, json

query = "sketch using iphone students"

# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
    "q": query,
    "first": 1
}

# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36"
}

response = requests.get("https://www.bing.com/images/search", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(response.text, "lxml")

for index, url in enumerate(soup.select(".iusc"), start=1):
    img_url = json.loads(url["m"])["murl"]
    image = requests.get(img_url, headers=headers, timeout=30)
    query = query.lower().replace(" ", "_")
    
    if image.status_code == 200:
        with open(f"images/{query}_image_{index}.jpg", 'wb') as file:
            file.write(image.content)

Using urllib.request.urlretrieve.

from bs4 import BeautifulSoup
import requests, lxml, json
import urllib.request as req

query = "sketch using iphone students"

# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
    "q": query,
    "first": 1
}

# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36"
}

response = requests.get("https://www.bing.com/images/search", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(response.text, "lxml")

for index, url in enumerate(soup.select(".iusc"), start=1):
    img_url = json.loads(url["m"])["murl"]
    query = query.lower().replace(" ", "_")

    opener = req.build_opener()
    opener.addheaders=[("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36")]
    req.install_opener(opener)
    req.urlretrieve(img_url, f"images/{query}_image_{index}.jpg")

Output:

Demonstration of uploading files to a folder

Solution 2:[2]

edit your code to find the designated image url and then use this code

use urllib.request

import urllib.request as req

    imgurl ="https://i.ytimg.com/vi/Ks-_Mh1QhMc/hqdefault.jpg"

    req.urlretrieve(imgurl, "image_name.jpg")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Community