'How to select elements that need to scroll down to load?

I've been scraping some websites in Python for practice and notice that when I need to select a set of elements on a scrollable area, I will only get a few instead of all the elements and I will only get them all if I scroll down for them to load. Here's an example:

from bs4 import BeautifulSoup
import requests

Zillow_URL = "https://www.zillow.com/homes/for_rent/1-_beds/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3Anull%2C%22mapBounds%22%3A%7B%22west%22%3A-122.56276167822266%2C%22east%22%3A-122.30389632177734%2C%22south%22%3A37.69261345230467%2C%22north%22%3A37.857877098316834%7D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22fr%22%3A%7B%22value%22%3Atrue%7D%2C%22fsba%22%3A%7B%22value%22%3Afalse%7D%2C%22fsbo%22%3A%7B%22value%22%3Afalse%7D%2C%22nc%22%3A%7B%22value%22%3Afalse%7D%2C%22cmsn%22%3A%7B%22value%22%3Afalse%7D%2C%22auc%22%3A%7B%22value%22%3Afalse%7D%2C%22fore%22%3A%7B%22value%22%3Afalse%7D%2C%22pmf%22%3A%7B%22value%22%3Afalse%7D%2C%22pf%22%3A%7B%22value%22%3Afalse%7D%2C%22mp%22%3A%7B%22max%22%3A3000%7D%2C%22price%22%3A%7B%22max%22%3A872627%7D%2C%22beds%22%3A%7B%22min%22%3A1%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A12%7D"

HEADERS = {...} # Im not sure if I should put my browser info to public

response = requests.get(url=Zillow_URL, headers=HEADERS)
soup = BeautifulSoup(response.text, "html.parser")

prices = soup.select("div.list-card-price")
print(prices, len(prices))

The only solution I can think of is to use a webdriver to scroll down and get the elements but that seems inefficient. Is there a way to achieve this without having to scroll down?



Solution 1:[1]

Not sure if this will help but you can actually pull the data as JSON from the below query.

import requests
import json
from bs4 import BeautifulSoup as bs

headers = {
'authority': 'www.zillow.com',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="98", "GoogleChrome";v="98"',
'dnt': '1',
'sec-ch-ua-mobile': '?0',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.81 Safari/537.36',
'sec-ch-ua-platform': '"Windows"',
'accept': '*/*',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
'sec-gpc': '1',
}

response = requests.get('https://www.zillow.com/search/GetSearchPageState.htm?searchQueryState=%7B%22pagination%22%3A%7B%22currentPage%22%3A1%7D%2C%22mapBounds%22%3A%7B%22west%22%3A-122.35259056091309%2C%22east%22%3A-122.15174674987793%2C%22south%22%3A37.79133717593069%2C%22north%22%3A37.84707034351907%7D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22price%22%3A%7B%22min%22%3A0%2C%22max%22%3A872627%7D%2C%22monthlyPayment%22%3A%7B%22min%22%3A0%2C%22max%22%3A3000%7D%2C%22beds%22%3A%7B%22min%22%3A1%7D%2C%22isForSaleByAgent%22%3A%7B%22value%22%3Afalse%7D%2C%22isForSaleByOwner%22%3A%7B%22value%22%3Afalse%7D%2C%22isNewConstruction%22%3A%7B%22value%22%3Afalse%7D%2C%22isForSaleForeclosure%22%3A%7B%22value%22%3Afalse%7D%2C%22isComingSoon%22%3A%7B%22value%22%3Afalse%7D%2C%22isAuction%22%3A%7B%22value%22%3Afalse%7D%2C%22isForRent%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A13%7D&wants={%22cat1%22:[%22listResults%22,%22mapResults%22]}', headers=headers)
soup = bs(response.text, 'lxml')

json_file = json.loads(soup.select_one('p').text.strip())
firms = [i for i in json_file['cat1']['searchResults']['mapResults']]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Samt94