'Beautiful Soup findAll doesn't find all information

I am trying to parse an HTML page using the BeautifulSoup Python library. However, I am unable to retrieve nested divs/classes beyond a certain point. When using the "findAll" function it does not return all of these tags. This particular site is using Bootstrap, and the info I am looking to retrieve is within an Accordion component. Does BeautifulSoup conflict with Bootstrap or am I not parsing the site correctly?

I am trying to get store location, such as address, postal code. https://www.needs.ca/en/store-locator/

code i used:

req = Request('https://www.needs.ca/en/store-locator/', headers={'User-Agent': 'Mozilla/5.0'})
web_byte = urlopen(req).read()
with requests.Session() as c:
    soup = BeautifulSoup(web_byte,'lxml')
soup

Can any web scrapping masters shed some light here? Really appreciated



Solution 1:[1]

You should use the network tab in the developer tools to get the API endpoints. On a quick look, POST https://www.needs.ca/wp-admin/admin-ajax.php is the endpoint they use. It returns JSON that looks like this:

{
  page: 1,
  total_page: 26,
  is_default: false,
  stores: [],
}

"Here's the python code you need to get it. I've tested the code so it should work."

import requests

headers = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0) Gecko/20100101 Firefox/98.0",
    "Host": "www.needs.ca"
}
data = {"action":"search_nearest_stores","lng":"-0.1234","lat":"79.3453","page":"1"}

def main():
    res = requests.post("https://www.needs.ca/wp-admin/admin-ajax.php", headers=headers, data=data)
    if res.status_code == 200:
        print(res.text)
    else:
        print(res.status_code, res.reason)

if __name__ = "__main__":
    main()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1