'Selenium element is not attached to the page document
I am trying to scrape this particular site with Python: https://www.milanofinanza.it/quotazioni/ricerca/listino-completo-2ae?refresh_cens.
I need to get all the isin codes and the names. My idea was to get them all in 2 separated lists, to do that I try to get the entire column (by changing the Xpath to tr rather than tr1) and then add it to the list.
My code goes through the pages but at a certain point just stop working (even if I add time.sleep(10) to be sure that the code starts scraping when the site is fully loaded).
My code looks like this:
wd = wd.Chrome()
wd.implicitly_wait(10)
wd.get('https://www.milanofinanza.it/quotazioni/ricerca/listino-completo-2ae')
company_name = []
isin = []
for n in range(0,15):
time.sleep(10)
tickers = wd.find_elements(By.XPATH,"//*[@id='mainbox']/div[2]/div[2]/div[4]/div/table/tbody/tr/td[1]")
isin = wd.find_elements(By.XPATH,"/html/body/div[3]/div/div/div/main/div[1]/div[2]/div[2]/div[4]/div/table/tbody/tr/td[10]/span")
for el in tickers:
company_name.append(el.text)
for i in isin:
isin.append(i.text)
l=wd.find_element(By.XPATH,"/html/body/div[2]/div/div/div/main/div[1]/div[2]/div[2]/div[4]/div/div[1]/div/button[4]")
wd.execute_script("arguments[0].click();",l)
print("data collected")
How can I solve this problem? Here some pictures to better understand:
Name:
Isin:
Solution 1:[1]
My first instinct is to tell you that selenium is probably a little bloated for what you're doing. There are times when you need a full-fledged browser, but this isn't one of them. I'd recommend requests and beautiful soup (it's more suited to making a shit load of requests.) I appreciate that you were running javascript to get more items (although for me, the reloading button wasn't doing anything) In that case, it is necessary. But, I source viewed the website to discover that the data you want can be retrieved in JSON format (so no need for BS) and in a simple get request.
import requests
data = requests.get("https://www.milanofinanza.it/Mercati/GetDataTabelle?alias=&campoOrdinamento=0002&numElem=30&ordinamento=asc&page=4&url=listino-completo-2ae?refresh_cens")
print(data.text)
Or do it like this so it's easier to adjust the params:
def pack(**kwargs):
return kwargs
data2 = requests.get("https://www.milanofinanza.it/Mercati/GetDataTabelle", params=pack(alias="",campoOrdinamento="0002",numElem=30,ordinamento="asc",page=4,url="listino-completo-2ae",refresh_cens=""))
I'm out of time; if I got the wrong data, LMK, and I'll correct the answer.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | JadeSpy |


