'how can i get data that is in href attribute of <a> using BeautifulSoup in python?

import requests
from bs4 import BeautifulSoup

url = 'https://www.maritimecourier.com/restaurant'

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
       'AppleWebKit/537.36 (KHTML, like Gecko) '\
       'Chrome/75.0.3770.80 Safari/537.36'}

response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text, "html.parser")

test = soup.select('.underline-body-links .sqs-block a, .underline-body-links .entry- 
content a, .underline-body-links .eventlist-excerpt a, .underline-body-links 
.playlist-description a, .underline-body-links .image-description a, .underline-body- 
links .sqs-block a:visited, .underline-body-links .entry-content a:visited, 
.underline-body-links .eventlist-excerpt a:visited, .underline-body-links .playlist- 
description a:visited, .underline-body-links .image-description a:visited')
test

With this code I get this output

[<a href="https://www.instagram.com/breakfast_dreams/" target="_blank">Breakfast Dreams</a>,
 <a href="https://www.maritimecourier.com/breakfast-dreams" target="_blank">MARITIME</a>,
 <a href="https://www.instagram.com/latarantellalb/" target="_blank">La Tarantella</a>]

Now, I am trying to get the URL and the name from the a tag

I would like to know how can I do this. So far I tried with this:

results = []

for restaurant in soup.select('.underline-body-links .sqs-block a, .underline-body-links .entry-content a, .underline-body-links .eventlist-excerpt a, .underline-body-links .playlist-description a, .underline-body-links .image-description a, .underline-body-links .sqs-block a:visited, .underline-body-links .entry-content a:visited, .underline-body-links .eventlist-excerpt a:visited, .underline-body-links .playlist-description a:visited, .underline-body-links .image-description a:visited'):
    results.append({
        'title':restaurant.find('a',{'target':'_blank'}).text
    })
results

But I got this

'NoneType' object has no attribute 'text'


Solution 1:[1]

Your selection is not quiet clear and also the expected output - Main issue is that you still selected the <a>s and try to find an <a> in an <a>.

So your extraction part should more look like this:

results.append({
    'title': restaurant.text,
    'url': restaurant.get('href')
})

You could also make your selection more specific:

[{'title':a.text, 'url':a.get('href')} for a in soup.select('.sqs-block-content a')]

or with out all the internal links:

 [{'title':a.text, 'url':a.get('href')} for a in soup.select('.sqs-block-content a') if 'maritimecourier' not in a.get('href')]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 HedgeHog