'How to find a regular expression to display headlines without extra characters?

I am attempting to figure out a regular expression that will display the headlines from a news feed of a stock.

This is the code I have so far, with the special characters of the regular expression being "<title.*?</":

def yahoo_hl(ticker):
    import re, requests
    headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0"}
    xml = requests.get(f'https://feeds.finance.yahoo.com/rss/2.0/headline?s={ticker}', headers=headers).text
    news_headlines = re.findall(r'<title.*?</', xml, re.DOTALL) # put your regular expression between the single quotes
    return news_headlines

When I run it, it displays the following output with the headlines showing in addition to "< title >" and the "< /" characters at the beginning and end of each headline:

['<title>Yahoo! Finance: TSLA News</',
 '<title>Tesla Is About to Start Production at Its Berlin Gigafactory</',
 '<title>Tesla CEO Elon Musk Wants the U.S. and the World to Pump More Oil</',
 '<title>Tesla Gets Stronger With Oil Rising, Other EV Stocks Not So Much</',
 '<title>What Is The Boring Company?</']

The goal is to remove the "< title >" and "<" to output the headlines like this:

['Yahoo! Finance: TSLA News',
 'Tesla Is About to Start Production at Its Berlin Gigafactory',
 'Tesla CEO Elon Musk Wants the U.S. and the World to Pump More Oil',
 'Tesla Gets Stronger With Oil Rising, Other EV Stocks Not So Much',
 'What Is The Boring Company?']

Any help would be appreciated. Thank you in advance.

python python-3.x

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'How to find a regular expression to display headlines without extra characters?

Sources

Related Questions