'Trying to create a web scraper for dell drivers using python3 and Beautiful Soup

I am trying to create a web scraper to grab info about Dell Drivers from their website. Apparently, it uses java on their site to load the data for the drivers to the web page. I am having difficulty getting the driver info from the webpage. this is what I have cobbled together so far.

from bs4 import BeautifulSoup
import urllib.request
import json

resp = urllib.request.urlopen("https://www.dell.com/support/home/en-us/product-support/product/precision-15-5520-laptop/drivers")
soup = BeautifulSoup(resp, 'html.parser', from_encoding=resp.info().get_param('charset'))

So far none of these have worked to try and get the data for the drivers:

data = json.loads(soup.find('script', type='text/preloaded').text)

data = json.loads(soup.find('script', type='application/x-suppress').text)

data = json.loads(soup.find('script', type='text/javascript').text)

data = json.loads(soup.find('script', type='application/ld+json').text)

I am not very skilled at python, I have been looking all over trying to cobble something together that works. Any assistance to help me get a little further in my endeavor would be greatly appreciate.



Solution 1:[1]

You can use selenium:

from selenium import webdriver
import time 
from bs4 import BeautifulSoup

driver = webdriver.Chrome()

driver.get('https://www.dell.com/support/home/en-us/product-support/product/precision-15-5520-laptop/drivers')

time.sleep(3)

page = driver.page_source

driver.close()

soup = BeautifulSoup(page,'html5lib')

Solution 2:[2]

I was able to get Sushil's answer working on my machine with some minor changes

from selenium import webdriver
import time 
from bs4 import BeautifulSoup

driver = webdriver.Chrome('C:/temp/chromedriver_win32/chromedriver.exe')

driver.get('https://www.dell.com/support/home/en-us/product-support/product/precision-15-5520-laptop/drivers')

time.sleep(3)

page = driver.page_source

driver.close()

soup = BeautifulSoup(page,'html.parser')

results = soup.find(id='downloads-table')

results2 = results.find_all(class_='dl-desk-view')
results3 = results.find_all(class_='details-control sorting_1')
results4 = results.find_all(class_='details-control')
results5 = results.find_all(class_='btn-download-lg btn btn-sm no-break text-decoration-none dellmetrics-driverdownloads btn-outline-primary')

The problem though is that this still only gets me 10 out of 79 drivers

I need a way to get all of the drivers that are available listed.

Solution 3:[3]

I was able to pull the JSON file that has the driver information. Saves a lot of hassle trying to use a web driver or other tricks.

Example for Dell Precision 7760 with Windows 10: https://www.dell.com/support/driver/en-us/ips/api/driverlist/fetchdriversbyproduct?productcode=precision-17-7760-laptop&oscode=WT64A (Note: "productcode" and "oscode" parameters.)

In order for this to work, you must have a request header "X-Requested-With" and set the value to "XMLHttpRequest". If you do not have this then you will get a "no content" result.

Format the resulting JSON and you should easily see the structure of the results including all of the driver data that you see on the support website.

Solution 4:[4]

My approach below:

Main component with router

<Switch>
  <Route 
    exact 
    path="/" 
    render={(props) => 
      <ChildComponent1
        {...props} 
        debug={this.props.debug}
        context={this.props.context}
        inDesignMode={this.props.inDesignMode}
      />}
  />
  <Route 
    path="/newcontract/:parentFolder?/:parentId?" 
    render={(props) => 
      <ChildComponent2
        {...props} 
        debug={this.props.debug}
        context={this.props.context}
        inDesignMode={this.props.inDesignMode}
      />}
  />
  <Route 
    path="/editcontract/:folder/:id" 
    render={(props) => 
      <ChildComponent3
        {...props} 
        debug={this.props.debug}
        context={this.props.context}
        inDesignMode={this.props.inDesignMode}
      />}
  />
  <Route 
    path="/viewcontract/:folder/:id" 
    render={(props) => 
      <ChildComponent4
        {...props}
        debug={this.props.debug}
        context={this.props.context}
        inDesignMode={this.props.inDesignMode}
      />
    }
  />
  <Route>
      {null} 
  </Route>
</Switch>

ChildComponent

import { RouteComponentProps  } from 'react-router';
export default class ChildComponent1 extends React.Component<IChildComponent1Props & RouteComponentProps, IChildComponent1State> {
 private _folder: string;
 private _id: string;
constructor(props: IChildComponent1Props & RouteComponentProps ) {
    super(props);
    // set initial state
    this.state = {
      ...
    };
    this._folder = props.match.params['folder'];
    this._id = props.match.params['id'];
  }
public componentDidUpdate(prevProps: IChildComponent1Props & RouteComponentProps, prevState: IChildComponent1State): void {
... 
if ((prevProps.match.params['folder'] !== this.props.match.params['folder']) || (prevProps.match.params['id'] !== this.props.match.params['id'])) {
      this._folder = this.props.match.params['folder'];
      this._id = this.props.match.params['id'];
}
// context is in this.props.context ...
...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Sushil
Solution 2
Solution 3 Aaron
Solution 4 Matej