'Trying to create a web scraper for dell drivers using python3 and Beautiful Soup
I am trying to create a web scraper to grab info about Dell Drivers from their website. Apparently, it uses java on their site to load the data for the drivers to the web page. I am having difficulty getting the driver info from the webpage. this is what I have cobbled together so far.
from bs4 import BeautifulSoup
import urllib.request
import json
resp = urllib.request.urlopen("https://www.dell.com/support/home/en-us/product-support/product/precision-15-5520-laptop/drivers")
soup = BeautifulSoup(resp, 'html.parser', from_encoding=resp.info().get_param('charset'))
So far none of these have worked to try and get the data for the drivers:
data = json.loads(soup.find('script', type='text/preloaded').text)
data = json.loads(soup.find('script', type='application/x-suppress').text)
data = json.loads(soup.find('script', type='text/javascript').text)
data = json.loads(soup.find('script', type='application/ld+json').text)
I am not very skilled at python, I have been looking all over trying to cobble something together that works. Any assistance to help me get a little further in my endeavor would be greatly appreciate.
Solution 1:[1]
You can use selenium:
from selenium import webdriver
import time
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get('https://www.dell.com/support/home/en-us/product-support/product/precision-15-5520-laptop/drivers')
time.sleep(3)
page = driver.page_source
driver.close()
soup = BeautifulSoup(page,'html5lib')
Solution 2:[2]
I was able to get Sushil's answer working on my machine with some minor changes
from selenium import webdriver
import time
from bs4 import BeautifulSoup
driver = webdriver.Chrome('C:/temp/chromedriver_win32/chromedriver.exe')
driver.get('https://www.dell.com/support/home/en-us/product-support/product/precision-15-5520-laptop/drivers')
time.sleep(3)
page = driver.page_source
driver.close()
soup = BeautifulSoup(page,'html.parser')
results = soup.find(id='downloads-table')
results2 = results.find_all(class_='dl-desk-view')
results3 = results.find_all(class_='details-control sorting_1')
results4 = results.find_all(class_='details-control')
results5 = results.find_all(class_='btn-download-lg btn btn-sm no-break text-decoration-none dellmetrics-driverdownloads btn-outline-primary')
The problem though is that this still only gets me 10 out of 79 drivers
I need a way to get all of the drivers that are available listed.
Solution 3:[3]
I was able to pull the JSON file that has the driver information. Saves a lot of hassle trying to use a web driver or other tricks.
Example for Dell Precision 7760 with Windows 10: https://www.dell.com/support/driver/en-us/ips/api/driverlist/fetchdriversbyproduct?productcode=precision-17-7760-laptop&oscode=WT64A (Note: "productcode" and "oscode" parameters.)
In order for this to work, you must have a request header "X-Requested-With" and set the value to "XMLHttpRequest". If you do not have this then you will get a "no content" result.
Format the resulting JSON and you should easily see the structure of the results including all of the driver data that you see on the support website.
Solution 4:[4]
My approach below:
Main component with router
<Switch>
<Route
exact
path="/"
render={(props) =>
<ChildComponent1
{...props}
debug={this.props.debug}
context={this.props.context}
inDesignMode={this.props.inDesignMode}
/>}
/>
<Route
path="/newcontract/:parentFolder?/:parentId?"
render={(props) =>
<ChildComponent2
{...props}
debug={this.props.debug}
context={this.props.context}
inDesignMode={this.props.inDesignMode}
/>}
/>
<Route
path="/editcontract/:folder/:id"
render={(props) =>
<ChildComponent3
{...props}
debug={this.props.debug}
context={this.props.context}
inDesignMode={this.props.inDesignMode}
/>}
/>
<Route
path="/viewcontract/:folder/:id"
render={(props) =>
<ChildComponent4
{...props}
debug={this.props.debug}
context={this.props.context}
inDesignMode={this.props.inDesignMode}
/>
}
/>
<Route>
{null}
</Route>
</Switch>
ChildComponent
import { RouteComponentProps } from 'react-router';
export default class ChildComponent1 extends React.Component<IChildComponent1Props & RouteComponentProps, IChildComponent1State> {
private _folder: string;
private _id: string;
constructor(props: IChildComponent1Props & RouteComponentProps ) {
super(props);
// set initial state
this.state = {
...
};
this._folder = props.match.params['folder'];
this._id = props.match.params['id'];
}
public componentDidUpdate(prevProps: IChildComponent1Props & RouteComponentProps, prevState: IChildComponent1State): void {
...
if ((prevProps.match.params['folder'] !== this.props.match.params['folder']) || (prevProps.match.params['id'] !== this.props.match.params['id'])) {
this._folder = this.props.match.params['folder'];
this._id = this.props.match.params['id'];
}
// context is in this.props.context ...
...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Sushil |
| Solution 2 | |
| Solution 3 | Aaron |
| Solution 4 | Matej |
