'How to detect missing XPATH and continue running for loop (Python)
How can I detect a missing xpath and continue running the loop?
When I scrape text from a website
url = "websiteurl" + partno
(i have found this url format to be consistent across products)
response = requests.get(url)
html_element = html.fromstring(response.text)
try:
prodno = html_element.xpath('/html/body/div[1]/div[2]/div/pes-product-main/section/div[2]/div/pes-main-area/pes-main-product-info//div/div/h2/text()')
if str(prodno) != ("['" + partno + "']"): # need [' '] because when the string is pulled from the dataframe, it will have those and flag the comparison as wrong.
print(prodno, partno)
prodscrapefailure.append(prodno)
elif str(prodno) == ("['" + partno + "']"):
print('yay')
productno.append(prodno)
except NoSuchElementException:
pass
I have a list of part numbers to scrape: 110XCA20300, 140ACI03000, 140ACI04000, 140ACO02000, cheesecake, 140ARI03010, 140ATI03000, 140CPS11420, 140CPS11420C, 140CPS12420
My aim is to scrape all the pages loaded using requests_html, 'cheesecake' is a stand in for an incorrect product code. I am trying to recognise when an xpath cant be found as the error page is loaded instead and have it write the incorrect product code to a data frame and then continue with the rest of the product codes.
I have been trying to use exception but keep meeting the error code " ValueError: 1 columns passed, passed data had 0 columns"
How can I detect a missing xpath and continue running the loop?
Solution 1:[1]
I realised that in this instance I could just add the input 'partno' to the prodscrapefailure list. instead of trying prodno, which of course, didn't exist in the case that nothing was found.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mmccrone |
