'Selenium on python - troubles with some pages
Im trying to collect information from one forum, and discovered, that some pages are broken. And I cant understand whats wrong with them. The only way I could resolve that trouble - is "try-catch". Looks like its a trouble with wrong symbols encoding. I dont understand whats on that pages because its a specific forum, so I cant check wrong symbols just by checking page with my eyes :)
Example of "dead" link: http://forum.worldoftanks.ru/index.php?/topic/1765984-105-lefh18b2-%d0%b3%d0%b0%d0%b9%d0%b4-%d1%83%d1%81%d1%82%d0%b0%d1%80%d0%b5%d0%bb/ http://forum.worldoftanks.ru/index.php?/topic/1727296-panzer-58-mutz-%d1%81%d1%82%d0%b0%d1%80%d1%82%d0%be%d0%b2%d0%b0%d1%8f-%d1%82%d0%b5%d0%bc%d0%b0/
Example of "live" link: http://forum.worldoftanks.ru/index.php?/topic/2124430-progetto-cc55-mod-54-%d1%81%d1%82%d0%b0%d1%80%d1%82%d0%be%d0%b2%d0%b0%d1%8f-%d1%82%d0%b5%d0%bc%d0%b0/
My python code:
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.by import By
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "eager"
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--disable-gpu')
with Chrome(options=chrome_options,desired_capabilities=caps) as browser:
browser.set_page_load_timeout(10)
browser.get(PAGE_LINK)
author = browser.find_element(by=By.XPATH, value='/html/body/div[2]/div[3]/div[4]/div/span[1]').text
print(author)
Its just an example, but I can do nothing with that pages, even with BeautifulSoup. Please help :)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
