'Webscraping with Selenium can't find element
for my data project, I am trying to scrape a website with selenium. It loads new articles by incrementing the page number : https://geschenkly.de/page/1/ and then 2/3/4 and so on. But beginning on the first site, it displays the site on chrome webdriver,but whenever I am trying to find an element, it either is empty or doens't exist:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import json
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument("--headless")
chrome_options.add_argument(f'user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36')
chrome_options.add_argument("window-size=1920,1080")
s=Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s, chrome_options=chrome_options)
chrome_options = Options()
#page = 1
driver.get('https://geschenkly.de/page/1/')
wait = wait(driver, 60)
elements = driver.find_elements(By.CLASS_NAME, "woocommerce-LoopProduct-link woocommerce-loop-product__link")
The class name is a link to the subdomains of the articles. I can find them when inspecting the page, but on selenium, elements is an empty array
Solution 1:[1]
woocommerce-LoopProduct-link woocommerce-loop-product__link are actually multiple class names. You can not locate such elements with By.CLASS_NAME.
To locate element by multiple class names you should use CSS_SELECTOR or XPATH.
Also you need to USE the expected conditions to wait for the elements, not just define that element without a use.
Also your locator could be improved.
This would work better:
driver.get('https://geschenkly.de/page/1/')
wait = wait(driver, 60)
elements = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".woocommerce-LoopProduct-link.woocommerce-loop-product__link")))
With locator above you will get irrelevant elements.
This will give you half less elements than the previous, it looks more correctly
driver.get('https://geschenkly.de/page/1/')
wait = wait(driver, 60)
elements = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.thumb-wrapper.zoom a.woocommerce-LoopProduct-link.woocommerce-loop-product__link")))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
