'How to get the href value of a link with bs4?

I need help where I can extract all the matches from 2020/2021's URLs from this [website][1] and scrape them.

I am sending a request to this link.

The section of the HTML that I want to retrieve is this part:


Here's the code that I am using:

    from bs4 import BeautifulSoup
    import requests
    import pandas as pd
    import urllib.parse
    website = 'https://www.espncricinfo.com/series/ipl-2020-21-1210595/match-results'
    response = requests.get(website)
    soup = BeautifulSoup(response.content,'html.parser')
    match_result = soup.find_all('a',{'class':'match-info-link-FIXTURES'});
    soup.get('href')
    url_part_1 = 'https://www.espncricinfo.com/'
    url_part_2 = []
    for item in match_result:
        url_part_2.append(item.get('href'))
    url_joined = []
    for link_2 in url_part_2:
        url_joined.append(urllib.parse.urljoin(url_part_1,link_2))
    first_link = url_joined[0]
    match_url = soup.find_all('div',{'class':'link-container border-bottom'});
    soup.get('href')
    url_part_3 = 'https://www.espncricinfo.com/'
    url_part_4 = []
    for item in match_result:
        url_part_4.append(item.get('href'))
    
    print(url_part_4)

  [1]: https://www.espncricinfo.com/series/ipl-2020-21-1210595/match-results

python beautifulsoup

Solution 1:^[1]

You don't need the second item.find_all('a',{'class':'match-info-link-FIXTURES'}): call below for item in match_result: since you already have the tags with the hrefs.

You can get the href with item.get('href').

You can do:

url_part_1 = 'https://www.espncricinfo.com/'
url_part_2 = []
for item in match_result:
    url_part_2.append(item.get('href'))

The result will look something like:

['/series/ipl-2020-21-1210595/delhi-capitals-vs-mumbai-indians-final-1237181/full-scorecard',
 '/series/ipl-2020-21-1210595/delhi-capitals-vs-sunrisers-hyderabad-qualifier-2-1237180/full-scorecard',
 '/series/ipl-2020-21-1210595/royal-challengers-bangalore-vs-sunrisers-hyderabad-eliminator-1237178/full-scorecard',
 '/series/ipl-2020-21-1210595/delhi-capitals-vs-mumbai-indians-qualifier-1-1237177/full-scorecard',
 '/series/ipl-2020-21-1210595/sunrisers-hyderabad-vs-mumbai-indians-56th-match-1216495/full-scorecard',
...
]

Solution 2:^[2]

From official doc's :

It’s very useful to search for a tag that has a certain CSS class, but the name of the CSS attribute, “class”, is a reserved word in Python. Using class as a keyword argument will give you a syntax error. As of Beautiful Soup 4.1.2, you can search by CSS class using the keyword argument class_.

Try

soup.find_all("a", class_="match-info-link-FIXTURES")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	D Malan
Solution 2	BlackCat

'How to get the href value of a link with bs4?

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]