'how to make this code run many times based on "page_num" varibale, to scrape all pages? using BeautifulSoup

I'm trying to scrape the websitehttps://www.bayut.sa/en/riyadh-region/villas-for-sale-in-riyadh/page-2/, the code succeeded to scrape the first page only which is page-2 here, but it does not work to scrape the rest of the pages to the last page in the website. I want to scrape all pages like the first page by increasing the page_num variable

import requests
from bs4 import BeautifulSoup
import csv
from itertools import zip_longest
import lxml


page_num = 2


while page_num < 288:
    # 1 lists
    district_Name = []
    property_size = []
    property_price = []
    links = []
    dates = []
    # beds = []
    # paths = []

    # 2 the link to the website
    # here I included the `page_num` in the link
    try:
        result = requests.get(f"https://www.bayut.sa/en/riyadh-region/villas-for-sale/page-{page_num}/")
        src = result.content

        # 4 create soup
        soup = BeautifulSoup(src, "lxml")

        # 5 titles we need: districtName, property Age, size, rooms, price
        districtName = soup.findAll("div", {"aria-label": "Location"})
        size = soup.findAll("span", {"aria-label": "Area"})
        price = soup.findAll("span",{"aria-label": "Price"})
        listing_link = soup.findAll("a", {"aria-label": "Listing link"})
        bed = soup.findAll("span", {"aria-label": "Beds"}, {"class": "b6a29bc0"})
        path = soup.findAll("span", {"aria-label": "Beds"}, {"class": "b6a29bc0"})


        main_url= 'https://www.bayut.sa'

        # 6 for loop to get text and append it to a list
        for i in range(len(districtName)):
            district_Name.append(districtName[i].text)
            links.append(main_url+listing_link[i].attrs["href"])
            property_size.append(size[i].text)
            property_price.append(price[i].text)
            # beds.append(bed[i].text)
            # paths.append(path[i].text)

        # 7 extract post date from inner page
        for link in (links):
            result = requests.get(link)
            src = result.content
            soup = BeautifulSoup(src, "lxml")
            date = soup.find("span", {"aria-label":"Reactivated date"})
            dates.append(date.text)

        page_num += 1
    except Exception as e:
        print(e)


    file_list = [district_Name, property_size, property_price, dates, links]
    exported = zip_longest(*file_list)
    # 8 create a csv file and fill it with values
    with open("C:/Users/Manso/Desktop/files\Data Analysis\Riyadh_Homes_english_beta.csv", "w") as homes_file:
        wr = csv.writer(homes_file, lineterminator="\n")
        wr.writerow(['district_Name', 'property_size', 'property_price', 'dates', 'links'])
        wr.writerows(exported)

Solution 1:^[1]

Your increment of page_num is outside the while loop, make sur you place it inside

You should change the while condition to be:

while page_num < 288:

and remove the if as it is doing nothing, you are setting the variable to 2 and then checking if it is less than 288, this is executed only once so it does nothing

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Mixone

'how to make this code run many times based on "page_num" varibale, to scrape all pages? using BeautifulSoup

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]