'Selenium time out exception on webpage content rendering
I am trying to scrap XLS spreadsheets from São Paulo's Secretary of Public Safety and, depending on the selected data, the report generation takes a long time (> 10 min) to retrieve the spreadsheet. Because of that, my python script returns the error:
Traceback (most recent call last):
File "/home/olivieri/Documents/data_science/projeto_criminalidade_sp/ssp_scrap_js-exec-1.py", line 135, in <module>
table_scrapping(table)
File "/home/olivieri/Documents/data_science/projeto_criminalidade_sp/ssp_scrap_js-exec-1.py", line 48, in table_scrapping
driver.execute_script("__doPostBack('ctl00$cphBody$ExportarBOLink','')")
File "/home/olivieri/.local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 879, in execute_script
return self.execute(command, {
File "/home/olivieri/.local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 425, in execute
self.error_handler.check_response(response)
File "/home/olivieri/.local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: timeout: Timed out receiving message from renderer: 300.000
(Session info: headless chrome=100.0.4896.88)
The exception is raised on driver.execute_script("__doPostBack('ctl00$cphBody$ExportarBOLink','')") execution, suggesting that it timed out on waiting the server to respond with the spreadsheet. It is quite difficult to replicate it but if you make crime == "RouboCelular" the error raises more frequently. I am looking for help to increase the implicit time on driver.execute_script execution or something else to make selenium more tolerant to occasional sluggish server response.
Here a minimal, reproducible example:
def wait_every_downloads_chrome(driver):
# Ref: https://stackoverflow.com/questions/48263317/selenium-python-waiting-for-a-download-process-to-complete-using-chrome-web
# Switch to chrome://downloads/
if not driver.current_url.startswith("chrome://downloads"):
driver.get("chrome://downloads/")
# Fetch download status at chrome://downloads/
not_complete = driver.execute_script("""
var items = document.querySelector('downloads-manager')
.shadowRoot.getElementById('downloadsList').items;
if (items.every(e => e.state === "COMPLETE"))
return false;
else
return true;
""")
# Repeat script until all downloads were completed
while not_complete:
time.sleep(0.5)
wait_every_downloads_chrome(driver)
driver.quit()
def table_scrapping(param):
driver.execute_script(f"__doPostBack('ctl00$cphBody$btn{param[0]}','')")
driver.execute_script(f"__doPostBack('ctl00$cphBody$lkAno{param[1]}','')")
driver.execute_script(f"__doPostBack('ctl00$cphBody$lkMes{param[2]}','')")
driver.execute_script("__doPostBack('ctl00$cphBody$ExportarBOLink','')")
url = 'http://www.ssp.sp.gov.br/transparenciassp/'
crime = "MorteSuspeita"
years = [str(i) for i in range(13, 21 + 1)]
months = [str(i) for i in range(1, 12 + 1)]
table_list = [[crime, year, month] for year in years for month in months]
tables_path = os.getcwd()
options = ChromeOptions()
options.add_argument('--headless')
options.add_argument('--disable-notifications')
options.add_argument('--disable-download-notification')
options.add_experimental_option(
'prefs',
{
"download.default_directory" : tables_path,
"profile.default_content_setting_values.automatic_downloads": 1
}
)
driver = Chrome(options = options)
driver.get(url)
for table in table_list:
table_scrapping(table)
wait_every_downloads_chrome(driver)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
