'How do i loop to one page to another in my web scraping project.. should scrap datas from all 250 pages..But it stops in the first page
#So this is a part of the scrapping code...but it does not gets looped more than the first page please help me to loop all through 250 pages of the etsy ecommerce website
URL = f'https://www.etsy.com/in-en/c/jewelry/earrings/ear-jackets-and-climbers?ref=pagination&page={page}' try: #Count for every page of website URL = URL.format(page) browser.get(URL) print("Scraping Page:",page)
#xpath of product table
PATH_1 ='//*[@id="content"]/div/div[1]/div/div[3]/div[2]/div[2]/div[9]/div/div/div'
#getting total items
items = browser.find_element(By.XPATH, PATH_1)
items = items.find_elements(By.TAG_NAME, 'li' )
#available items in page
end_product = len(items)
#Count for every product of the page
for product in range(0,end_product):
print("Scarping reviews for product", product +1)
#clicking on product
try:
items[product].find_element(By.TAG_NAME, 'a').click()
except:
print('Product link not found')
#switch the focus of driver to new tab
windows = browser.window_handles
browser.switch_to.window(windows[1])
try:
PATH_2 = '//*[@id="reviews"]/div[2]/div[2]'
count = browser.find_element(By.XPATH, PATH_2)
#Number of review on any page
count = count.find_elements(By.CLASS_NAME, 'wt-grid wt-grid--block wt-mb-xs-0')
for r1 in range(1,len(count)+1):
dat1 = browser.find_element(By.XPATH ,
'//*[@id="reviews"]/div[2]/div[2]/div[1]/div[1]/p'.format(
r1)).text
if dat1[:dat1.find(',')-6] not in person:
try:
person.append(dat1[:dat1.find(',')-6])
date.append(dat1[dat1.find(',')-6:])
except Exception:
person.append("Not Found")
date.append("Not Found")
try:
stars.append(browser.find_element(By.XPATH ,
'//*[@id="reviews"]/div[2]/div[2]/div[1]/div[2]/div[1]/div/div/span/span[2]'.format(
r1)).text[0])
except Exception:
stars.append("No stars")
except Exception:
browser.close()
#swtiching focus to main tab
browser.switch_to.window(windows[0])
#export data after every product
#export_data()
except Exception as e_1:
print(e_1)
print("Program stoped:")
export_data()
browser.quit()
#defining the main function
def main():
logging.basicConfig(filename='solution_etsy.log', level=logging.INFO)
logging.info('Started')
if 'page.txt' in os.listdir(os.getcwd()):
with open('page.txt','r') as file1:
page = int(file1.read())
for i in range(1 ,250):
run_scraper(i,browser)
else:
for i in range(1,250):
with open('page.txt','w') as file:
file.write(str(i))
run_scraper(i,browser)
export_data()
print("--- %s seconds ---" % (time.time() - start_time))
logging.info('Finished')
# Calling the main function
if __name__ == '__main__':
main()
So in this code please help to loop from one page to another where do i apply the loop.
Solution 1:[1]
stud = 'https://www.etsy.com/in-en/c/jewelry/earrings/ear-jackets-and-climbers?ref=pagination&page={}'
from time import sleep
from tqdm.notebook import tqdm
for i in tqdm(range(1, 250)):
url_pages = stud.format(i)
browser.get(url_pages)
sleep(4) ## sleep of 4 will rest your code for 4 sec so that the entire page is loded you can adjust it according to your internet speed.
html = browser.page_source
soup = BeautifulSoup(html, 'html.parser)
### funtion or anything that you want to apply from here
Then follow your steps as you wish this will load all the pages.
If it still not working, look how I have scraped data from multiple pages from site similar to this site Github link to my solution : https://github.com/PullarwarOm/devtomanager.com-Web-Scraping/blob/main/devtomanager%20Final.ipynb
The scraped page is similar to page you are trying to scrape so go through this ipynb file. Thank you.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
