'urllib.request.urlretrieve downloading the wrong files from instagram

When trying to web scrape from instagram, the code would pull images from the post url even if it was a video. Im confused on why does it take a random jpg rather than an mp4 which is in the else statement.

    time.sleep(5)
    posts = []
links = driver.find_elements_by_tag_name('a')
for link in links:
    post = link.get_attribute('href')
    if '/p/' in post:
      posts.append(post)
      

#get videos and images

download_url = ''
for post in posts:  
    driver.get(post)
    shortcode = driver.current_url.split("/")[-2]
    time.sleep(7)
    if driver.find_element_by_css_selector("img[style='object-fit: cover;']") is not None:
        download_url = driver.find_element_by_css_selector("img[style='object-fit: cover;']").get_attribute('src')
        urllib.request.urlretrieve( download_url, '{}.jpg'.format(shortcode))
    else:
        download_url = driver.find_element_by_css_selector("video[type='video/mp4']").get_attribute('src')
        urllib.request.urlretrieve( download_url, '{}.mp4'.format(shortcode))
    time.sleep(5)


Solution 1:[1]

I suppose we can use a "Try-Except" block instead:

try:
    download_url = driver.find_element(By.CSS_SELECTOR, "img[style='object-fit: cover;']").get_attribute('src')
    print(download_url)
    urlretrieve(download_url, '{}.jpg'.format(shortcode))
except:
    download_url = driver.find_element(By.CSS_SELECTOR, "video[type='video/mp4']").get_attribute('src')
    print(download_url)
    urlretrieve(download_url, '{}.mp4'.format(shortcode))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Duc Nguyen