'Beautiful Soup findAll doesn't find all information
I am trying to parse an HTML page using the BeautifulSoup Python library. However, I am unable to retrieve nested divs/classes beyond a certain point. When using the "findAll" function it does not return all of these tags. This particular site is using Bootstrap, and the info I am looking to retrieve is within an Accordion component. Does BeautifulSoup conflict with Bootstrap or am I not parsing the site correctly?
I am trying to get store location, such as address, postal code. https://www.needs.ca/en/store-locator/
code i used:
req = Request('https://www.needs.ca/en/store-locator/', headers={'User-Agent': 'Mozilla/5.0'})
web_byte = urlopen(req).read()
with requests.Session() as c:
soup = BeautifulSoup(web_byte,'lxml')
soup
Can any web scrapping masters shed some light here? Really appreciated
Solution 1:[1]
You should use the network tab in the developer tools to get the API endpoints. On a quick look, POST https://www.needs.ca/wp-admin/admin-ajax.php is the endpoint they use. It returns JSON that looks like this:
{
page: 1,
total_page: 26,
is_default: false,
stores: [],
}
"Here's the python code you need to get it. I've tested the code so it should work."
import requests
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0) Gecko/20100101 Firefox/98.0",
"Host": "www.needs.ca"
}
data = {"action":"search_nearest_stores","lng":"-0.1234","lat":"79.3453","page":"1"}
def main():
res = requests.post("https://www.needs.ca/wp-admin/admin-ajax.php", headers=headers, data=data)
if res.status_code == 200:
print(res.text)
else:
print(res.status_code, res.reason)
if __name__ = "__main__":
main()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
