'BeautifulSoup Assistance

I am trying to scrape the following website (https://www.english-heritage.org.uk/visit/blue-plaques/#?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0) and ultimately am interested in storing some of the data inside each 'li class="search-result-item"' to perform further analytics.

Example of one "search-result-item"

I want to capture the <h3>,<span class="plaque-role"> and <span class="plaque-location"> in a python dictionary:

<li class="search-result-item"><a href="/visit/blue-plaques/helen-gwynne-vaughan/"><img class="search-result-image max-width" src="/siteassets/home/visit/blue-plaques/find-a-plaque/blue-plaques-f-j/helen-gwynne-vaughan-plaque.jpg?w=732&amp;h=465&amp;mode=crop&amp;scale=both&amp;cache=always&amp;quality=60&amp;anchor=&amp;WebsiteVersion=20220516171525" alt="" title=""><div class="search-result-info"><h3>GWYNNE-VAUGHAN, Dame Helen (1879-1967)</h3><span class="plaque-role">Botanist and Military Officer</span><span class="plaque-location">Flat 93, Bedford Court Mansions, Fitzrovia, London, WC1B 3AE, London Borough of Camden</span></div></a></li>

So far I am trying to isolate all the "search-result-item" but my current code prints absolutely nothing. If someone can help me sort that problem out and point me in the right direction to storing each data element into a python dictionary I would be very grateful.

from bs4 import BeautifulSoup
import requests

url = 'https://www.english-heritage.org.uk/visit/blue-plaques/#?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0'
page = requests.get(url)

soup = BeautifulSoup(page.text, 'html.parser')
#print(soup.prettify())
print(soup.find_all(class_='search-result-item')).get_text()


Solution 1:[1]

Content is generated dynamically by JavaScript so you wont find the elements / info you are looking for with BeautifulSoup, instead use their API.

Example

import requests

url = 'https://www.english-heritage.org.uk/api/BluePlaqueSearch/GetMatchingBluePlaques?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0'
page = requests.get(url).json()
data = []
for e in page['plaques']:
    data.append(dict((k,v) for k,v in e.items() if k in ['title','professions','address']))
data
Output
[{'title': 'GWYNNE-VAUGHAN, Dame Helen (1879-1967)', 'address': 'Flat 93, Bedford Court Mansions, Fitzrovia, London, WC1B 3AE, London Borough of Camden', 'professions': 'Botanist and Military Officer'}, {'title': 'READING, Lady Stella (1894-1971)', 'address': '41 Tothill Street, London, City of Westminster, SW1H 9LQ, City Of Westminster', 'professions': "Founder of the Women's Voluntary Service"}, {'title': '32 SOHO SQUARE', 'address': '32 Soho Square, Soho, London, W1D 3AP, City Of Westminster', 'professions': 'Botanists'}, {'title': '14 BUCKINGHAM STREET', 'address': '14 Buckingham Street, Covent Garden, London, WC2N 6DF, City Of Westminster', 'professions': 'Statesman, Diarist, Naval Official, Painter'}, {'title': 'ABRAHAMS, Harold (1899-1978)', 'address': 'Hodford Lodge, 2 Hodford Road, Golders Green, London, NW11 8NP, London Borough of Barnet', 'professions': 'Athlete'}, ...]

Solution 2:[2]

Check available statistics https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Statistics-definitions.html

Percentile rank (PR) should work in your case. try with PR(10:300) and set threshold to 30%

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Tomasz Bre?