'Selenium on python - troubles with some pages

Im trying to collect information from one forum, and discovered, that some pages are broken. And I cant understand whats wrong with them. The only way I could resolve that trouble - is "try-catch". Looks like its a trouble with wrong symbols encoding. I dont understand whats on that pages because its a specific forum, so I cant check wrong symbols just by checking page with my eyes :)

Example of "dead" link: http://forum.worldoftanks.ru/index.php?/topic/1765984-105-lefh18b2-%d0%b3%d0%b0%d0%b9%d0%b4-%d1%83%d1%81%d1%82%d0%b0%d1%80%d0%b5%d0%bb/ http://forum.worldoftanks.ru/index.php?/topic/1727296-panzer-58-mutz-%d1%81%d1%82%d0%b0%d1%80%d1%82%d0%be%d0%b2%d0%b0%d1%8f-%d1%82%d0%b5%d0%bc%d0%b0/

Example of "live" link: http://forum.worldoftanks.ru/index.php?/topic/2124430-progetto-cc55-mod-54-%d1%81%d1%82%d0%b0%d1%80%d1%82%d0%be%d0%b2%d0%b0%d1%8f-%d1%82%d0%b5%d0%bc%d0%b0/

My python code:

from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.by import By
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "eager"
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--disable-gpu')  
with Chrome(options=chrome_options,desired_capabilities=caps) as browser:
    browser.set_page_load_timeout(10)
    browser.get(PAGE_LINK)        
    author = browser.find_element(by=By.XPATH, value='/html/body/div[2]/div[3]/div[4]/div/span[1]').text
    print(author)

Its just an example, but I can do nothing with that pages, even with BeautifulSoup. Please help :)



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source