'Can't find page elements using Selenium python
I am trying to extraxt the review text from this page.
Here's a condensed version of the html shown in my chrome browser inspector:
<div id="module_product_review" class="pdp-block module">
<div class="lazyload-wrapper ">
<div class="pdp-mod-review" data-spm="ratings_reviews" lazada_pdp_review="expose" itemid="1615006548" data-nosnippet="true" data-aplus-ae="x1_490e4591" data-spm-anchor-id="a2o42.pdp_revamp.0.ratings_reviews.508466b1OJjCoH">
<div>...</div>
<div>...</div>
<div>
<div class="mod-reviews">
<div class="item">
<div class="top">...</div>
<div class="middle">...</div>
<div class="item-content">
<div class="content" data-spm-anchor-id="a2o42.pdp_revamp.ratings_reviews.i3.508466b1OJjCoH">Slim and light. feel good. better if providing 16G version.</div>
<div class="review-image">...></div>
<div class="skuInfo">Color Family:MYSTIC SILVER</div>
<div class="bottom">...</div>
<div class="dialogs"></div>
</div>
<div class="seller-reply-wrapper">...</div>
<div class="item">...</div>
<div class="item">...</div>
<div class="item">...</div>
<div class="item">...</div>
</div>
</div>
</div>
</div>
</div>
I'm trying to extract the "Slim and light. feel good. better if providing 16G version." text from the class="content" element.
But when I try to retrieve the id="module_product_review" element using Selenium in python, this is what I get instead:
<div class="pdp-block module" id="module_product_review">
<div class="lazyload-wrapper">
<div class="lazy-load-placeholder">
<div class="lazy-load-skeleton">
</div>
</div>
</div>
</div>
This is my code:
op = webdriver.ChromeOptions()
op.add_argument('--headless')
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=op)
driver.get("https://www.lazada.sg/products/huawei-matebook-d14-laptop-14-fullview-display-intel-i5-processor-8gb512gb-intel-uhd-graphics-i1615006548-s7594078907.html?spm=a2o42.searchlist.list.3.15064828Od60kh&search=1&freeshipping=1")
module_product_review = driver.find_element(By.ID, "module_product_review")
html = module_product_review.get_attribute("outerHTML")
soup = BeautifulSoup(html, 'lxml')
print(soup.prettify())
I thought it might have been because I was retrieving the element before it was fully loaded, so I tried to sleep the program for 30 seconds before calling find_element(), but I still get the same result. As far as I can tell, it's not an issue of iframes or shadow roots either.
Is there some other issue that I'm missing?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
