'Cannot getting the "href" attributes via BeautifulSoup

in short, i can't get the links of "href" attribute from this link (a turkish online book and related stuff seller).

here's my code (i know it's not the best, i'm learning python for a few months online, so any heads up for best practices also welcomed) i tried to get book names, writers, prices, publishers and the links for each book; without links it's working as i expected.

import requests
import pandas as pd
from bs4 import BeautifulSoup
from time import sleep
from random import randint
yazar = []
fiyat = []
yayın = []
isim = []
for i in range(1,10):
    url = "https://www.dr.com.tr/CokSatanlar/Kitap#/page="+str(i)
    page = requests.get(url)
    soup = BeautifulSoup(page.text, "lxml")
    # book names
    k = soup.find_all("a", {"class":"prd-name"})
    for i in k:
        isim.append(i.text)
    # writer names
    y = soup.find_all("a", {"class":"who text-overflow"})
    for i in y:
        yazar.append(i.text)
    # prices
    f = soup.find_all("div", {"class":"prd-price"})
    for i in f:
        fiyat.append(i.text.split()[0])
    # publishers
    ye = soup.find_all("a", {"class":"prd-publisher"})
    for i in ye:
        yayın.append(i.get("title"))
    
    sleep(randint(2, 4))

however when i try to get links

soup.find_all("a", {"class":"prd-name"}).get("href")

it turns none and i couldn't manage to make this work whatever i tried. thank you all in advance and sorry for a little longer than usual post.



Solution 1:[1]

Think you wont get a None you will get:

AttributeError: ResultSet object has no attribute 'get'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

find_all() produces a ResultSet, so you have to iterate it to get all the href:

for a in soup.find_all("a", {"class":"prd-name"}):
    print('https://www.dr.com.tr'+a.get("href"))
Output
https://www.dr.com.tr/kitap/daha-adil-bir-dunya-mumkun/arastirma-tarih/politika-arastirma/turkiye-politika-/urunno=0001934858001
https://www.dr.com.tr/kitap/burasi-cok-onemli-enerjiden-ekonomiye-tam-bagimsiz-turkiye/arastirma-tarih/politika-arastirma/turkiye-politika-/urunno=0001966362001
https://www.dr.com.tr/kitap/iz-biraktigin-kadar-varsin/egitim-basvuru/psikoloji-bilimi/urunno=0001947472001
https://www.dr.com.tr/kitap/simdi-onlar-dusunsun/bircan-yildirim/egitim-basvuru/kisisel-gelisim/urunno=0001964436001
https://www.dr.com.tr/kitap/kadinlar-sicak-erkekler-soguk-sever/esra-ezmeci/egitim-basvuru/psikoloji-bilimi/urunno=0001904239001
https://www.dr.com.tr/kitap/dustugunde-kalkarsan-hayat-guzeldir/egitim-basvuru/psikoloji-bilimi/urunno=0001816754001
...

Solution 2:[2]

The data you see on the page is loaded from external location, so you need other URL to get correct data:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://www.dr.com.tr/Catalog/CatalogProducts"

data = {
    "catalogId": "4020",
    "page": "1",
    "sortfield": "soldcount",
    "sortorder": "desc",
    "size": "60",
    "categoryid": "0",
    "parentId": "0",
    "mediatypes": "",
    "HideNotForSale": "true",
    "minPrice": "-1",
    "maxPrice": "-1",
    "writer": "",
    "minDiscount": "-1",
    "maxdiscount": "-1",
    "language": "",
}

all_data = []
for page in range(1, 3):  # <-- increase number of pages here
    print(f"Getting page {page}")

    data["page"] = page
    soup = BeautifulSoup(requests.post(url, data=data).content, "html.parser")

    for p in soup.select(".prd-content"):
        all_data.append(p.get_text(strip=True, separator="|").split("|")[:5])


df = pd.DataFrame(
    all_data, columns=["name", "autor", "price", "type", "publisher"]
)
print(df)
df.to_csv("data.csv", index=False)

Prints:

                                                                         name                      autor      price        type             publisher
0                            Esra Ezmeci Seti 5 Kitap Tak?m - Defter Hediyeli                Esra Ezmeci  155,45 TL  ?nce Kapak      Destek Yay?nlar?
1                                                        ?imdi Onlar Dü?ünsün            Bircan Y?ld?r?m   36,20 TL  ?nce Kapak      Destek Yay?nlar?
2                                                  ?z B?rakt???n Kadar Vars?n                Esra Ezmeci   36,20 TL  ?nce Kapak      Destek Yay?nlar?

...

and saves data.csv (screenshot from Libre Office):

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 HedgeHog
Solution 2 Andrej Kesely