'Find link in specific tags using python, beautiful soup and lambda functions

I have the following html and I am using bs4 beautiful soup in python 3 to extract all hrefs from links that are located inside a specific tag: . It should not be important if there might me more than one or no link nested in the . Furthermore, there would be another step where I filter out links that don't have the "base.html" ending.

<article>
   <a href='link/base.html'>click me!</a>
</article>
...
<article>
   <a href='link2/base.html'>click me!</a>
</article>
...
<article>
   <a href='link3/base.html'>click me!</a>
</article>

This is my code

page = bs4.BeautifulSoup(source, 'html.parser')

articles = page.find_all(name="article")

article_links = map(lambda article: article.a, articles)

article_links = map(lambda tag: tag.get('href'), article_links)

article_links = filter(lambda link: 'base.html' in link, article_links)

article_links = map(lambda link: url + link, article_links)

However, this results in an

AttributeError: 'NoneType' object has no attribute 'get''

at the .get('href') part in line 4. Other variations result in different errors. It needs to be lambda functions. Preferably, I would also like to combine the first two lambda functions into one.



Solution 1:[1]

Not sure why to use lambda, so just in case select your targets more specific with css selectors and iterate result set with list comprehension:

[url+a['href'] for a in page.select('article a[href*="base.html"]')]

Example

from bs4 import BeautifulSoup

url = 'http://www.example.com/'
html = '''<article>
   <a href='link/base.html'>click me!</a>
</article>
...
<article>
   <a href='link2/base.html'>click me!</a>
</article>
...
<article>
   <a href='link3/base.html'>click me!</a>
</article>'''

page = BeautifulSoup(html, 'html.parser')

[url+a['href'] for a in page.select('article a[href*="base.html"]')]

Output

['http://www.example.com/link/base.html',
 'http://www.example.com/link2/base.html',
 'http://www.example.com/link3/base.html']

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1