'Python Instagram Photographs Download

I'm a python newbie who has been assigned the task of downloading and storing locally at least 100 to 200 photographs preferably in .jpg format. The code was provided to me, but thus far I haven't been able to get it to work. The code goes to https://www.instagram.com/explore/tags/feetphotos to get the photographs.

I've created an account to access the images. The code creates a insta_foot.csv file which contains an index and a link to the photograph itself to be downloaded. The .csv file gets created sometimes with the indices and links sometimes without the same. The last part of the code downloads the photographs from each of the links into an images directory created locally by the script.

import time
import pandas as pd
import requests
import bs4 as bs
from selenium import webdriver
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
url = 'https://www.instagram.com/explore/tags/foot/'
driver.get(url)

img_sizes = ['150w', '240w', '320w', '480w', '640w']
df = pd.DataFrame(columns = img_sizes)

last_height = driver.execute_script("return document.body.scrollHeight") 
while True:
    el = driver.find_element_by_tag_name('body')
    soup = bs.BeautifulSoup(el.get_attribute('innerHTML'), 'lxml')

    for t in soup.findAll('img', {"class": "FFVAD"}):
        a_series = pd.Series(['https://'+s.split(' ')[0] for s in 
        t['srcset'].split('https://')[1:]], index = df.columns)
        df = df.append(a_series, ignore_index=True)
    df.drop_duplicates(inplace = True) 

print('last_height: ', last_height, '  links: ', len(df))

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)

new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
    break
last_height = new_height

df.to_csv('insta_foot.csv')

size = '640w'
for i, row in df.iterrows():
    link = row[size]
    n = 'images/' + [e for e in link.split('/') if '.jpg' in e][0].split('.jpg')[0] + 
'_' + size + '.jpg'

with open(n,"wb") as f:
    f.write(requests.get(link).content)

print('index: ', i)


driver.close()


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source