'Parsing multiple Tables from multiple URLs with similar column names
So I am scraping this product website that has multiple pages, and every page has tables with similar column but different column values. heres an example: https://www.benchmade.com/317-1-weekender.html and like wise heres another one: https://www.benchmade.com/15600or-raghorn.html and there are about 144 links like this.
What I want is a table where I could have all the similar columns grouped into one and be the column headers and the rows being the column values.
So something like this that could be outputted as a csv table:
Blade Length. | | Blade Thickness|| Open Length |--etc etc
|------------- | |----------------||-------------|
| 2.97/1.97" | | 4.34/12.54 || 1.23/5.65 |
| 4.24/2.23" | | 2.34/5.63 || 5.43/2.90 |
| 3.54/2.65 | | 2.57/6.54 || 6.90/4.20 |
| 7.65/5/43 | | 4.65/3.56 || 3.32/4.54 |
I have done this so far:
product_links = []
for x in range (1,4):
HTML = requests.get(f'https://www.benchmade.com/all-products.html?blade_edge=521%2C531%2C2231&p={x}&price=75-2400&product_list_limit=48',HEADER)
#HTML.status_code
Booti= soup(HTML.content, "lxml")
knife_items = Booti.find_all('li',class_= "item product product-item")
for items in knife_items:
for links in items.findAll('a', class_= "product photo product-item-photo", href = True):
product_links.append(links['href'])
for links_2 in product_links:
#testlinks = "https://www.benchmade.com/4010-211-collectors-edition-station-knife.html"
Specifications_data = pd.read_html(links_2)[0]
Any help would be appreciated!!! Thank you so much!
Solution 1:[1]
Quite easy to do with pandas.
import pandas as pd
urls = ['https://www.benchmade.com/317-1-weekender.html',
'https://www.benchmade.com/15600or-raghorn.html']
final_df = pd.DataFrame()
for url in urls:
df = pd.read_html(url)[0].set_index(0).T
final_df = final_df.append(df, sort=False).reset_index(drop=True)
Output:
print(final_df)
0 Blade Length: Blade Thickness: ... Weight: Sheath Weight:
0 2.97/1.97" | 7.16/5.00cm 0.090" | 2.286mm ... 2.28oz | 64.64g NaN
1 4.64" | 11.78 cm 0.09" | 2.286mm ... COMING SOON 21.26g
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | chitown88 |
