'Download image to a folder using web-scraping
I want to download image for the images mentioned in the url using bs4. My code works to extract the
<div class="item-name" data-toggle="collapse" data-target="#exam-4" aria-expanded=false> <div class="ui-h2">April 2022 <span class="ui-tag grey-transparent">14 Exams</span></div> </div> <div class="item-details collapse " id="exam-4" data-parent="#exam-month"> <div class="row"> <div class="col-12 col-lg-4"> <div class="ui-card hover-scale"> <a href="https://example.com/uppsc-acf-rfo" class="card-link exam-cards"> <div> <span class="icon calendar-icon"></span> <span class="help__content help__content--small">3 Apr 2022</span> <span class="ui-tag green-filled">Official</span> </div> <div class="footer-container"> <span class="exam-icon"> <img src="https://blogmedia.com/blog/wp-content/uploads/2020/06/uttar-pradesh-logo-png-8-5bbbec3b.png" height="30"> </span> <span class="exam-name" title="UPPSC ACF RFO Mains">UPPSC ACF RFO Mains</span> <span class="exam-cta"> Know More <span class="right-icon"></span> </span> </div> </a> </div> </div>
I am using the following code:
soup = BeautifulSoup(html, 'html.parser')
rows = soup.find_all('div', {'class':'row'})
rowList = []
for row in rows:
cards = row.find_all('div', {'class':re.compile("^ui-card hover-scale")})
for card in cards:
dateStr = card.find('span',{'class':re.compile("^help__content")}).text.strip()
examName = card.find('span', {'class':'exam-name'}).text
rowList.append({'date':dateStr,
'exam':examName})
df = pd.DataFrame(rowList)
df.to_csv('filename.csv', index=False)
Current Output:
0 3 Apr 2022 UPPSC ACF RFO Mains
Expected Output :
0 3 Apr 2022 UPPSC ACF RFO Mains uttar-pradesh-logo-png-8-5bbbec3b.png
And .png stored in another directory. PS : I am only adding a part of html. There are multiple cards
Solution 1:[1]
import urllib.request
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(html, 'html.parser')
rows = soup.find_all('div', {'class':'row'})
rowList = []
for row in rows:
cards = row.find_all('div', {'class':re.compile("^ui-card")})
for card in cards:
try:
dateStr = card.find('span',{'class':re.compile("^help__content")}).text.strip()
except Exception as e:
print(e)
dateStr = 'N/A'
try:
examName = card.find('span', {'class':'exam-name'}).text
except Exception as e:
print(e)
examName = 'N/A'
try:
imgUrl = card.find('img')['src']
imgFile = imgUrl.split('/')[-1]
# To Write to file
urllib.request.urlretrieve(imgUrl, imgFile)
except Exception as e:
print(e)
imgFile = 'N/A'
rowList.append({'date':dateStr,
'exam':examName,
'img':imgFile})
df = pd.DataFrame(rowList)
df.to_csv('filename.csv', index=False)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
