'Cannot getting the "href" attributes via BeautifulSoup
in short, i can't get the links of "href" attribute from this link (a turkish online book and related stuff seller).
here's my code (i know it's not the best, i'm learning python for a few months online, so any heads up for best practices also welcomed) i tried to get book names, writers, prices, publishers and the links for each book; without links it's working as i expected.
import requests
import pandas as pd
from bs4 import BeautifulSoup
from time import sleep
from random import randint
yazar = []
fiyat = []
yayın = []
isim = []
for i in range(1,10):
url = "https://www.dr.com.tr/CokSatanlar/Kitap#/page="+str(i)
page = requests.get(url)
soup = BeautifulSoup(page.text, "lxml")
# book names
k = soup.find_all("a", {"class":"prd-name"})
for i in k:
isim.append(i.text)
# writer names
y = soup.find_all("a", {"class":"who text-overflow"})
for i in y:
yazar.append(i.text)
# prices
f = soup.find_all("div", {"class":"prd-price"})
for i in f:
fiyat.append(i.text.split()[0])
# publishers
ye = soup.find_all("a", {"class":"prd-publisher"})
for i in ye:
yayın.append(i.get("title"))
sleep(randint(2, 4))
however when i try to get links
soup.find_all("a", {"class":"prd-name"}).get("href")
it turns none and i couldn't manage to make this work whatever i tried. thank you all in advance and sorry for a little longer than usual post.
Solution 1:[1]
Think you wont get a None you will get:
AttributeError: ResultSet object has no attribute 'get'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
find_all() produces a ResultSet, so you have to iterate it to get all the href:
for a in soup.find_all("a", {"class":"prd-name"}):
print('https://www.dr.com.tr'+a.get("href"))
Output
https://www.dr.com.tr/kitap/daha-adil-bir-dunya-mumkun/arastirma-tarih/politika-arastirma/turkiye-politika-/urunno=0001934858001
https://www.dr.com.tr/kitap/burasi-cok-onemli-enerjiden-ekonomiye-tam-bagimsiz-turkiye/arastirma-tarih/politika-arastirma/turkiye-politika-/urunno=0001966362001
https://www.dr.com.tr/kitap/iz-biraktigin-kadar-varsin/egitim-basvuru/psikoloji-bilimi/urunno=0001947472001
https://www.dr.com.tr/kitap/simdi-onlar-dusunsun/bircan-yildirim/egitim-basvuru/kisisel-gelisim/urunno=0001964436001
https://www.dr.com.tr/kitap/kadinlar-sicak-erkekler-soguk-sever/esra-ezmeci/egitim-basvuru/psikoloji-bilimi/urunno=0001904239001
https://www.dr.com.tr/kitap/dustugunde-kalkarsan-hayat-guzeldir/egitim-basvuru/psikoloji-bilimi/urunno=0001816754001
...
Solution 2:[2]
The data you see on the page is loaded from external location, so you need other URL to get correct data:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.dr.com.tr/Catalog/CatalogProducts"
data = {
"catalogId": "4020",
"page": "1",
"sortfield": "soldcount",
"sortorder": "desc",
"size": "60",
"categoryid": "0",
"parentId": "0",
"mediatypes": "",
"HideNotForSale": "true",
"minPrice": "-1",
"maxPrice": "-1",
"writer": "",
"minDiscount": "-1",
"maxdiscount": "-1",
"language": "",
}
all_data = []
for page in range(1, 3): # <-- increase number of pages here
print(f"Getting page {page}")
data["page"] = page
soup = BeautifulSoup(requests.post(url, data=data).content, "html.parser")
for p in soup.select(".prd-content"):
all_data.append(p.get_text(strip=True, separator="|").split("|")[:5])
df = pd.DataFrame(
all_data, columns=["name", "autor", "price", "type", "publisher"]
)
print(df)
df.to_csv("data.csv", index=False)
Prints:
name autor price type publisher
0 Esra Ezmeci Seti 5 Kitap Tak?m - Defter Hediyeli Esra Ezmeci 155,45 TL ?nce Kapak Destek Yay?nlar?
1 ?imdi Onlar Dü?ünsün Bircan Y?ld?r?m 36,20 TL ?nce Kapak Destek Yay?nlar?
2 ?z B?rakt???n Kadar Vars?n Esra Ezmeci 36,20 TL ?nce Kapak Destek Yay?nlar?
...
and saves data.csv (screenshot from Libre Office):
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | HedgeHog |
| Solution 2 | Andrej Kesely |

