'Scrape table from the site using selenium or beautifulsoup
I am trying to parse the able from the site https://octopart.com/mcp3304-bi%2Fp-microchip-407390?r=sp#PriceAndStock. I have tried using xpath of a table with selenium but it fetches only first row. I have also tried html parse with beautifulsoup but I get unstructured text from table.
Code trials:
driver.get('https://octopart.com/search?q=PMEG120G20ELRX¤cy=USD&specs=0')
soup = BeautifulSoup(driver.page_source, 'html.parser')
table=soup.find('table')
for distributor in table.find_all('tbody'):
rows=distributor.find_all('tr')
for row in rows:
data=row.find('td')
print(data)
Solution 1:[1]
To scrape the table from the website you need to induce WebDriverWait for the visibility_of_element_located() and using DataFrame from Pandas you can use the following locator strategy:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
driver.get('https://octopart.com/search?q=PMEG120G20ELRX¤cy=USD&specs=0')
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'part')]//table"))).get_attribute("outerHTML")
df = pd.read_html(data)
print(df)
driver.quit()
Console Output:
[ Unnamed: 0 Distributor SKU Stock MOQ ... 10 100 1,000 10,000 Updated
0 NaN Future Electronics3 4128873 500 1 ... 0.260 0.200 0.182 0.170 1d
1 NaN Digi-Key3 1727-PMEG120G20ELRXCT-ND 488 1 ... 0.378 0.257 0.145 0.145 <1m
2 NaN TTI PMEG120G20ELRX 18000 3000 ... NaN NaN NaN 0.124 1d
3 NaN Mouser 771-PMEG120G20ELRX 4461 1 ... 0.378 0.258 0.150 0.149 14m
4 NaN Verical PMEG120G20ELRX 6000 3000 ... NaN NaN NaN 0.178 <1m
[5 rows x 13 columns]]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
