'How to webscrape data into a pandas dataframe?

I am trying to webscrape data into a dataframe. I am trying to pull the "High" "Median" and "Low" numbers below for a list of stocks into a dataframe in which the rows are the stock names and the column titles are "High" "Median" and "Low". How do I specifically select the numbers following the H,M,L headings and insert them into the dataframe?

The HTML for the table containing the data I need is:

[<table aria-label="stock price targets data table" class="table value-pairs no-heading font--lato">
                <tbody>
                    <tr class="table__row">
                        <td class="table__cell w75">High</td>
                        <td class="table__cell w25">$215.00</td>
                    </tr>
                    <tr class="table__row">
                        <td class="table__cell w75">Median</td>
                        <td class="table__cell w25">$188.50</td>
                    </tr>
                    <tr class="table__row">
                        <td class="table__cell w75">Low</td>
                        <td class="table__cell w25">$173.00</td>
                    </tr>
                    <tr class="table__row is-highlighted">
                        <td class="table__cell w75">Average</td>
                        <td class="table__cell w25">$190.44</td>
                    </tr>
                    <tr class="table__row">
                        <td class="table__cell w75">Current Price</td>
                        <td class="table__cell w25">$178.64</td>
                    </tr>
                </tbody>
            </table>]

The code I'm using is:

Target_Equities_List = ["MSFT",
                        "K",
                        "JNJ"]
price_targets = pd.DataFrame(index=Target_Equities_List, columns = ["High", "Median", "Low"])
for ticker in Target_Equities_List:
    url = 'https://www.marketwatch.com/investing/stock/'+ticker+'/analystestimates'
    page = requests.get(url)
    page_content = page.content
    soup = bs(page_content,'html5lib')
    tabl = soup.find_all("table", {"aria-label" : "stock price targets data table"})
    for t in tabl:
        rows = t.find_all("tr", {"class" : "table__row"})
        for row in rows:
            print(row.get_text())

The output I'm getting is:

High
$410.00
         
Median
$360.00

Low
$298.18

Average
$360.97
                    
Current Price
$284.47
                    
High
$76.00

Median
$67.00
                    
Low
$55.00
                    
Average
$66.85
                    
Current Price
$67.37

High
$215.00
                    
Median
$188.50
                    
Low
$173.00
                    
Average
$190.44


Solution 1:[1]

You can use pd.read_html to read <table> into a dataframe, pd.concat all dataframes together and .pivot the final DataFrame:

import requests
import pandas as pd
from bs4 import BeautifulSoup

Target_Equities_List = ["MSFT", "K", "JNJ"]

dfs = []
for ticker in Target_Equities_List:
    url = (
        f"https://www.marketwatch.com/investing/stock/{ticker}/analystestimates"
    )
    soup = BeautifulSoup(requests.get(url).content, "html5lib")
    tabl = soup.find_all(
        "table", {"aria-label": "stock price targets data table"}
    )
    df = pd.read_html(str(tabl))[0]
    df["Ticker"] = ticker
    dfs.append(df)

df = pd.concat(dfs)
df = df.pivot(index="Ticker", columns=0, values=1)[["High", "Median", "Low"]]
df = df.reset_index()
df.columns.name = None
df.index.name = None
print(df.to_markdown())

Prints:

Ticker High Median Low
0 JNJ $215.00 $188.50 $173.00
1 K $76.00 $67.00 $55.00
2 MSFT $410.00 $360.00 $298.18

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Andrej Kesely