'How to reduce the time of scraping with selenium if I input the ZIPCODE?

I am trying to scrape the price of allergy products in Target. For each product, i will input all the US zip codes to see the effect of changing ZIPCODE on price. And i use selenium to input the ZIPCODE for each products. However, i have more than 40000 ZIPCODES and 200 products total to scrape. If I run my code, the run time of the code will be too long(almost 90 days..) because each time it need 2 seconds for selenium to input the zipcode. What should I do to reduce the time of running?

while(True):
    priceArray = []
    nameArray = []
    zipCodeArray =[]
    GMTArray = []

    wait_imp = 10
    CO = webdriver.ChromeOptions()
    CO.add_experimental_option('useAutomationExtension', False)
    CO.add_argument('--ignore-certificate-errors')
    CO.add_argument('--start-maximized')
    wd = webdriver.Chrome(r'D:\chromedriver\chromedriver_win32new\chromedriver_win32 (2)\chromedriver.exe',options=CO)

    for url in urlList:
        wd.get(url)
        wd.implicitly_wait(wait_imp)

        for zipcode in zipCodeList:
            try:
                #click the delivery address
                address = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[4]/div/div[1]/button[2]")
                address.click()
                #click the Edit location
                editLocation = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[4]/div/div[2]/button")
                editLocation.click()
            except:
                #directly click he Edit location
                editLocation = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[4]/div[1]/div/div[1]/button")
                editLocation.click()

            #input ZipCode
            inputZipCode = wd.find_element(by=By.XPATH, value="//*[@id='enter-zip-or-city-state']")
            inputZipCode.clear()
            inputZipCode.send_keys(zipcode)

            #click submit
            clickSubmit = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[4]/div/div[2]/div/div/div[3]/div/button[1]")
            clickSubmit.click()

            #start scraping
            name = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[1]/h1/span").text
            nameArray.append(name)
            price = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[1]/div[1]/span").text
            priceArray.append(price)
            currentZipCode = zipcode
            zipCodeArray.append(currentZipCode)
            tz = pytz.timezone('Europe/London')
            GMT = datetime.now(tz)
            GMTArray.append(GMT)


    data = {'prod-name': nameArray,
            'Price': priceArray,
            'currentZipCode': zipCodeArray,
            "GMT": GMTArray
            }
    df = pd.DataFrame(data, columns= ['prod-name', 'Price','currentZipCode',"GMT"])
    df.to_csv(r'C:\Users\12987\PycharmProjects\Network\priceingAlgoriCoding\export_Target_dataframe.csv', mode='a', index = False, header=True)


Solution 1:[1]

For selenium - Use python concurrent.futures to run drivers parallelly.

You may checkout this answer - link

here is a snippet for ThreadPoolExecutor

from selenium import webdriver  
from concurrent import futures

def selenium_title(url):  
  wdriver = webdriver.Chrome() # chrome webdriver
  wdriver.get(url)  
  title = wdriver.title  
  wdriver.quit()
  return title

links = ["https://www.amazon.com", "https://www.google.com"]

with futures.ThreadPoolExecutor() as executor: # default/optimized number of threads or pass `max_workers` param value like max_workers=10
  titles = list(executor.map(selenium_title, links))

You can also use ProcessPoolExecutor

with futures.ProcessPoolExecutor() as executor: # default/optimized number of processes
   titles = list(executor.map(selenium_title, links))

Thus you can achieve x times boost (x = number of workers)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ahmedshahriar