'BeautifulSoup find all links that contain any of the following words
Trying to find hrefs that contain any of the following words for example google,yahoo,msnbc
I can know how to find say all google, just lost on the next step please help
for a in elem.parent.find_all('a', string=re.compile('google')):
is what im using to find google
Solution 1:[1]
Basically (at least as far as I understand it) the string parameter just checks if a certain string is in the text of the tag, and not in a particularly efficient way (no hashing magic, simply iterating over the text of the tag).
But you can do that yourself, and make a check that fits your needs, perhaps something like that:
soup = BeautifulSoup(html_string, features = 'html.parser')
keywords = {'google', 'yahoo', 'msnbc'}
def check_keywords(link_text, words):
for word in words:
if word in link_text:
return True
return False
all_links = [a for a in soup.find_all('a') if check_keywords(a['href'], keywords)]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Nadav Porat |
