'Python Multi-threading selenium click
I have one long code that allows me to webscrape a dynamic table (which requires multiple clicking on svg objects before scraping to obtain the details I require) from three different independent websites using selenium. I have been trying to use python threading to scrape each of the websites concurrently to speed up the process. I attempted the following:
from time import sleep, perf_counter
from threading import Thread
start_time = perf_counter()
threads = []
# create three new threads
t1 = Thread(target=task1)
threads.append(t1)
t2 = Thread(target=task2)
threads.append(t2)
t3 = Thread(target=task3)
threads.append(t3)
# start the threads
for t in threads:
t.start()
# wait for the threads to complete
for thread in threads:
thread.join()
end_time = perf_counter()
print(f'It took {end_time- start_time: 0.0f} second(s) to complete.')
FYI, task1,task2,task3 above each represent a different website that are being scraped.
Whilst this code above doesn't break (i.e. it opens the three websites and starts clicking and scraping each one), task1 would usually finish clicking first and then scrapes, and when it does, task 2 and task 3 also suddenly stops clicking and just scrapes too (so not all details are being captured prior to scrape), which is not what I want.
my understanding of the thread.join() was that each thread will not finish until all the threads have finished running and each thread are independent of each other, and whilst all threads finish at the same time, the clicking on svj objects for task 2 and task 3 are cut short when task 1 clicking is done
This did not happen before applying python threading so I'm not sure what is causing this issue and whether there is a solution for this?
Many thanks in advance
Solution 1:[1]
thread.join() is wait, until your thread finish his job and delete it for freeing up space.
So basicly loop is work not like you think.
After this part of code:
for t in threads:
t.start()
when the first thread is finish, you goes to this part:
for thread in threads:
thread.join()
And here, the loop do thread.join() with all three thread.
t1.join()
t2.join()
t3.join()
in a row.
I think, you can just don't use loop and write the code in a row, like I do it above:
t1.join()
t2.join()
t3.join()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
