'How to webscrape data into a pandas dataframe?
I am trying to webscrape data into a dataframe. I am trying to pull the "High" "Median" and "Low" numbers below for a list of stocks into a dataframe in which the rows are the stock names and the column titles are "High" "Median" and "Low". How do I specifically select the numbers following the H,M,L headings and insert them into the dataframe?
The HTML for the table containing the data I need is:
[<table aria-label="stock price targets data table" class="table value-pairs no-heading font--lato">
<tbody>
<tr class="table__row">
<td class="table__cell w75">High</td>
<td class="table__cell w25">$215.00</td>
</tr>
<tr class="table__row">
<td class="table__cell w75">Median</td>
<td class="table__cell w25">$188.50</td>
</tr>
<tr class="table__row">
<td class="table__cell w75">Low</td>
<td class="table__cell w25">$173.00</td>
</tr>
<tr class="table__row is-highlighted">
<td class="table__cell w75">Average</td>
<td class="table__cell w25">$190.44</td>
</tr>
<tr class="table__row">
<td class="table__cell w75">Current Price</td>
<td class="table__cell w25">$178.64</td>
</tr>
</tbody>
</table>]
The code I'm using is:
Target_Equities_List = ["MSFT",
"K",
"JNJ"]
price_targets = pd.DataFrame(index=Target_Equities_List, columns = ["High", "Median", "Low"])
for ticker in Target_Equities_List:
url = 'https://www.marketwatch.com/investing/stock/'+ticker+'/analystestimates'
page = requests.get(url)
page_content = page.content
soup = bs(page_content,'html5lib')
tabl = soup.find_all("table", {"aria-label" : "stock price targets data table"})
for t in tabl:
rows = t.find_all("tr", {"class" : "table__row"})
for row in rows:
print(row.get_text())
The output I'm getting is:
High
$410.00
Median
$360.00
Low
$298.18
Average
$360.97
Current Price
$284.47
High
$76.00
Median
$67.00
Low
$55.00
Average
$66.85
Current Price
$67.37
High
$215.00
Median
$188.50
Low
$173.00
Average
$190.44
Solution 1:[1]
You can use pd.read_html to read <table> into a dataframe, pd.concat all dataframes together and .pivot the final DataFrame:
import requests
import pandas as pd
from bs4 import BeautifulSoup
Target_Equities_List = ["MSFT", "K", "JNJ"]
dfs = []
for ticker in Target_Equities_List:
url = (
f"https://www.marketwatch.com/investing/stock/{ticker}/analystestimates"
)
soup = BeautifulSoup(requests.get(url).content, "html5lib")
tabl = soup.find_all(
"table", {"aria-label": "stock price targets data table"}
)
df = pd.read_html(str(tabl))[0]
df["Ticker"] = ticker
dfs.append(df)
df = pd.concat(dfs)
df = df.pivot(index="Ticker", columns=0, values=1)[["High", "Median", "Low"]]
df = df.reset_index()
df.columns.name = None
df.index.name = None
print(df.to_markdown())
Prints:
| Ticker | High | Median | Low | |
|---|---|---|---|---|
| 0 | JNJ | $215.00 | $188.50 | $173.00 |
| 1 | K | $76.00 | $67.00 | $55.00 |
| 2 | MSFT | $410.00 | $360.00 | $298.18 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Andrej Kesely |
