'Scrape table from the site using selenium or beautifulsoup

I am trying to parse the able from the site https://octopart.com/mcp3304-bi%2Fp-microchip-407390?r=sp#PriceAndStock. I have tried using xpath of a table with selenium but it fetches only first row. I have also tried html parse with beautifulsoup but I get unstructured text from table.

Code trials:

driver.get('https://octopart.com/search?q=PMEG120G20ELRX&currency=USD&specs=0')
soup = BeautifulSoup(driver.page_source, 'html.parser')

table=soup.find('table')
for distributor in table.find_all('tbody'):
    rows=distributor.find_all('tr')
    for row in rows:
        data=row.find('td')
        print(data)


Solution 1:[1]

To scrape the table from the website you need to induce WebDriverWait for the visibility_of_element_located() and using DataFrame from Pandas you can use the following locator strategy:

Code Block:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

driver.get('https://octopart.com/search?q=PMEG120G20ELRX&currency=USD&specs=0')
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'part')]//table"))).get_attribute("outerHTML")
df = pd.read_html(data)
print(df)
driver.quit()

Console Output:

[   Unnamed: 0          Distributor                       SKU  Stock   MOQ  ...     10    100  1,000  10,000  Updated
0         NaN  Future Electronics3                   4128873    500     1  ...  0.260  0.200  0.182   0.170       1d
1         NaN            Digi-Key3  1727-PMEG120G20ELRXCT-ND    488     1  ...  0.378  0.257  0.145   0.145      <1m
2         NaN                  TTI            PMEG120G20ELRX  18000  3000  ...    NaN    NaN    NaN   0.124       1d
3         NaN               Mouser        771-PMEG120G20ELRX   4461     1  ...  0.378  0.258  0.150   0.149      14m
4         NaN              Verical            PMEG120G20ELRX   6000  3000  ...    NaN    NaN    NaN   0.178      <1m

[5 rows x 13 columns]]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1