'chapters[i].find('a').attrs['title'] AttributeError: 'NoneType' object has no attribute 'attrs'
Guys I try to get the 'href' because I want to go inside every one of them and download all the images inside them but I get the problem when it says has no attribute attrs
import requests
from bs4 import BeautifulSoup
import os
url = 'https://readmanganato.com/manga-dr980474'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
images = soup.find_all('img')
chapters = []
chapters_num = []
chapters = soup.find_all('a',{"class":"chapter-name text-nowrap"})
for i in range(len(chapters)):
chapters_num.append(chapters[i])
chapters[i].find('a').attrs['title']
print(chapters_num)
Solution 1:[1]
Expanding on the comments:
- Consider switching to a
for item in liststyle for loop. - Consider a try/except block to help with troubleshooting as web scraping is full of potholes to catch your feet.
- For extracting the
titleattribute from eacha/anchor tag just go straight to the attrs without thefind().
import requests
from bs4 import BeautifulSoup
import os
url = 'https://readmanganato.com/manga-dr980474'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
images = soup.find_all('img')
chapters = []
chapters_num = []
chapters = soup.find_all('a',{"class":"chapter-name text-nowrap"})
titles=[]
for chapter in chapters:
chapters_num.append(chapter)
try:
titles.append(chapter.attrs['title'])
except:
print('failed extracting chapter:',chapter)
print(titles)
print(chapters_num)
Solution 2:[2]
Try the following approach:
import requests
from bs4 import BeautifulSoup
url = 'https://readmanganato.com/manga-dr980474'
req_main = requests.get(url)
soup_main = BeautifulSoup(req_main.content, 'html.parser')
data = []
for chapter in soup_main.find_all('a',{"class":"chapter-name text-nowrap"}, href=True):
req_sub = requests.get(chapter['href'])
soup_sub = BeautifulSoup(req_sub.content, 'html.parser')
imgs = [img['src'] for img in soup_sub.find_all('img')]
data.append([chapter['title'], chapter['href'], imgs])
for title, href, imgs in data:
print(title, href, imgs)
This shows an easier way to iterate over the a tags. The href=True ensures that only entries with a href tag are returned. If you have problems with title missing, you could also add title=True.
It then shows how to get your sub pages and extract a list of all the images for each page. You probably would want to add a loop to get the images rather than just get the URLs.
This would give the first two entries as:
Solo Leveling Chapter 180 https://readmanganato.com/manga-dr980474/chapter-180 ['https://readmanganato.com/themes/hm/images/logo-chap.png', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_180/1-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_180/2-n.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_180/3-o.jpg', 'https://readmanganato.com/themes/hm/images/gohome.png']
Solo Leveling Chapter 179.2 https://readmanganato.com/manga-dr980474/chapter-179.2 ['https://readmanganato.com/themes/hm/images/logo-chap.png', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/1-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/2-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/3-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/4-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/5-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/6-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/7-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/8-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/9-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/10-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/11-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/12-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/13-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/14-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/15-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/16-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/17-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/18-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/19-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/20-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/21-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/22-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/23-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/24-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/25-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/26-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/27-o.jpg', 'https://v7.mkklcdnv6tempv3.com/img/tab_7/02/91/17/dr980474/chapter_179_2/28-o.jpg', 'https://readmanganato.com/themes/hm/images/gohome.png']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | JNevill |
| Solution 2 | Martin Evans |
