'Selenium time out exception on webpage content rendering

I am trying to scrap XLS spreadsheets from São Paulo's Secretary of Public Safety and, depending on the selected data, the report generation takes a long time (> 10 min) to retrieve the spreadsheet. Because of that, my python script returns the error:

    Traceback (most recent call last):
  File "/home/olivieri/Documents/data_science/projeto_criminalidade_sp/ssp_scrap_js-exec-1.py", line 135, in <module>
    table_scrapping(table)
  File "/home/olivieri/Documents/data_science/projeto_criminalidade_sp/ssp_scrap_js-exec-1.py", line 48, in table_scrapping
    driver.execute_script("__doPostBack('ctl00$cphBody$ExportarBOLink','')")
  File "/home/olivieri/.local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 879, in execute_script
    return self.execute(command, {
  File "/home/olivieri/.local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 425, in execute
    self.error_handler.check_response(response)
  File "/home/olivieri/.local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: timeout: Timed out receiving message from renderer: 300.000
  (Session info: headless chrome=100.0.4896.88)

The exception is raised on driver.execute_script("__doPostBack('ctl00$cphBody$ExportarBOLink','')") execution, suggesting that it timed out on waiting the server to respond with the spreadsheet. It is quite difficult to replicate it but if you make crime == "RouboCelular" the error raises more frequently. I am looking for help to increase the implicit time on driver.execute_script execution or something else to make selenium more tolerant to occasional sluggish server response.

Here a minimal, reproducible example:

def wait_every_downloads_chrome(driver):
    # Ref: https://stackoverflow.com/questions/48263317/selenium-python-waiting-for-a-download-process-to-complete-using-chrome-web
    # Switch to chrome://downloads/
    if not driver.current_url.startswith("chrome://downloads"):
        driver.get("chrome://downloads/")
    
    # Fetch download status at chrome://downloads/
    not_complete = driver.execute_script("""
        var items = document.querySelector('downloads-manager')
            .shadowRoot.getElementById('downloadsList').items;
        if (items.every(e => e.state === "COMPLETE"))
            return false;
        else
            return true;
        """)
    
    # Repeat script until all downloads were completed
    while not_complete:
        time.sleep(0.5)
        wait_every_downloads_chrome(driver)

    driver.quit()

def table_scrapping(param):
    driver.execute_script(f"__doPostBack('ctl00$cphBody$btn{param[0]}','')")
    driver.execute_script(f"__doPostBack('ctl00$cphBody$lkAno{param[1]}','')")
    driver.execute_script(f"__doPostBack('ctl00$cphBody$lkMes{param[2]}','')")
    driver.execute_script("__doPostBack('ctl00$cphBody$ExportarBOLink','')")
    

url = 'http://www.ssp.sp.gov.br/transparenciassp/'
crime = "MorteSuspeita"
years = [str(i) for i in range(13, 21 + 1)]
months = [str(i) for i in range(1, 12 + 1)]

table_list = [[crime, year, month] for year in years for month in months]
tables_path = os.getcwd()

options = ChromeOptions()
options.add_argument('--headless')
options.add_argument('--disable-notifications')
options.add_argument('--disable-download-notification')
options.add_experimental_option(
    'prefs',
    {
        "download.default_directory" : tables_path,
        "profile.default_content_setting_values.automatic_downloads": 1
    }
)

driver = Chrome(options = options)

driver.get(url)

for table in table_list:
    table_scrapping(table)

wait_every_downloads_chrome(driver)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source