'Read yahoo RSS with Python
i am trying to parse a yahoo rss feed containing the following item in order to put all the data into a pandas dataframe
<item>
<title>Retirement-Reform Bill Could Slash Taxes</title>
<link>
https://www.thestreet.com/retirement/secure-2-new-retirement-reform-bill-could-slash-taxes?puc=yahoo&cm_ven=YAHOO&yptr=yahoo
</link>
<source url="http://www.thestreet.com/">TheStreet.com</source>
</item>
The code so far:
import pandas as pd
from requests_html import HTML
from requests_html import HTMLSession
session = HTMLSession()
response = session.get(url)
with response as r:
items = r.html.find("item", first=False)
for item in items:
print(item.html)
link = item.find('link')
link now contains [<Element 'link' >] and although i checked with dir(link) how to access the Element i didn't found a way.
Then i have seen i kind of conversion using item.html
I am getting a <link/> instead of <link> ... </link>
<item><title>Retirement-Reform Bill Could Slash Taxes</title><link/>https://www.thestreet.com/retirement/secure-2-new-retirement-reform-bill-could-slash-taxes?puc=yahoo&cm_ven=YAHOO&yptr=yahoo <pubdate>2022-03-29T16:22:00Z</pubdate><source url="http://www.thestreet.com/">TheStreet.com</source><guid ispermalink="false">secure-2-new-retirement-reform-bill-could-slash-taxes?puc=yahoo&cm_ven=YAHOO&yptr=yahoo</guid><content height="86" url="https://s.yimg.com/uu/api/res/1.2/C_nEPpNu9gDsTyEFMP4_Gw--~B/aD00MDA7dz02MDA7YXBwaWQ9eXRhY2h5b24-/https://media.zenfs.com/en/thestreet.com/2f7c38768e85cdba72bfed65497673a6" width="130"/><credit role="publishing company"/></item>
any hints how to access the link Element?
Solution 1:[1]
Consider treat the result as string and split the value as follows:
# Here you're getting the full element.
link = item.find('link')
# Get the link only:
link = link.split('<link/>')[1].split('<')[0].strip()
Prints:
https://www.thestreet.com/retirement/secure-2-new-retirement-reform-bill-could-slash-taxes?puc=yahoo&cm_ven=YAHOO&yptr=yahoo
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Marco Aurelio Fernandez Reyes |
