'Find link in specific tags using python, beautiful soup and lambda functions
I have the following html and I am using bs4 beautiful soup in python 3 to extract all hrefs from links that are located inside a specific tag: . It should not be important if there might me more than one or no link nested in the . Furthermore, there would be another step where I filter out links that don't have the "base.html" ending.
<article>
<a href='link/base.html'>click me!</a>
</article>
...
<article>
<a href='link2/base.html'>click me!</a>
</article>
...
<article>
<a href='link3/base.html'>click me!</a>
</article>
This is my code
page = bs4.BeautifulSoup(source, 'html.parser')
articles = page.find_all(name="article")
article_links = map(lambda article: article.a, articles)
article_links = map(lambda tag: tag.get('href'), article_links)
article_links = filter(lambda link: 'base.html' in link, article_links)
article_links = map(lambda link: url + link, article_links)
However, this results in an
AttributeError: 'NoneType' object has no attribute 'get''
at the .get('href') part in line 4. Other variations result in different errors. It needs to be lambda functions. Preferably, I would also like to combine the first two lambda functions into one.
Solution 1:[1]
Not sure why to use lambda, so just in case select your targets more specific with css selectors and iterate result set with list comprehension:
[url+a['href'] for a in page.select('article a[href*="base.html"]')]
Example
from bs4 import BeautifulSoup
url = 'http://www.example.com/'
html = '''<article>
<a href='link/base.html'>click me!</a>
</article>
...
<article>
<a href='link2/base.html'>click me!</a>
</article>
...
<article>
<a href='link3/base.html'>click me!</a>
</article>'''
page = BeautifulSoup(html, 'html.parser')
[url+a['href'] for a in page.select('article a[href*="base.html"]')]
Output
['http://www.example.com/link/base.html',
'http://www.example.com/link2/base.html',
'http://www.example.com/link3/base.html']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
