'Beautiful soup articles scraping

Why does my code only finds 5 articles instead all of all 30 in the page?

Here is my code:

    import requests
    from bs4 import BeautifulSoup
    import pandas as pd
    
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'}
    
    url = 'https://www.15min.lt/tema/svietimas-24297'
    r = requests.get(url, headers=headers)
    soup = BeautifulSoup(r.text, 'html.parser')
    
    antrastes = soup.find_all('h3', {'class': 'vl-title'})
    
    print(antrastes)


Solution 1:[1]

Page uses JavaScript to add items but requests/BeautifulSoup can't run JavaScript.

It may need to use Selenium to control real web browser which can run JavaScript.
And it may also need some JavaScript code to scroll page.

Eventually you can check in DevTools in Firefox/Chrome if JavaScript loads data from some URL and you can try to use this URL with requests. It may need to use Session to get cookies and headers from first GET.


This code uses URL which I found in DevTools (tab: Network, filter: XHR).

It needs to set different offset (date time) in url to get different rows - url.format(offset)

If you use current datetime then you don't even need to read main page.

It needs header 'X-Requested-With': 'XMLHttpRequest' to work.

It sends JSON data with keys rows (with HTML) and offset (with datetime for next rows).
And I use this offset to get next rows. I run this in loop to get more rows.

import urllib.parse
import requests
from bs4 import BeautifulSoup
import datetime

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest',
}

url = 'https://www.15min.lt/tags/ajax/list/svietimas-24297?tag=24297&type=&offset={}&last_row=2&iq=L&force_wide=true&cachable=1&layout%5Bw%5D%5B%5D=half_wide&layout%5Bw%5D%5B%5D=third_wide&layout%5Bf%5D%5B%5D=half_wide&layout%5Bf%5D%5B%5D=third_wide&cosite=default'

offset = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')

for _ in range(5):
    print('=====', offset, '=====')
    
    offset = urllib.parse.quote_plus(offset)
    
    response = requests.get(url.format(offset), headers=headers)
    
    data = response.json()
    
    soup = BeautifulSoup(data['rows'], 'html.parser')
    antrastes = soup.find_all('h3', {'class': 'vl-title'})
    
    for item in antrastes:
        print(item.text.strip())
        print('---')
    
    offset = data['offset']  # offset for next data

Result:

===== 2022-03-09 21:20:36 =====
Konkursas „Praeities stipryb? – dabar?iai“. Susipažinkite su finalinink? darbais ir išrinkite nugal?tojus
---
ŠMSM ? ukrainie?i? vaik? ugdym? žada ?traukti ir atvykstan?ius mokytojus
---
Did?jant b?reli? Vilniuje finansavimui, tikimasi ?traukti ir ukrainie?i? vaikus
---
Myl?ti priešus – ne glostyti palei plauk?
---
Atvira pamoka su prof. Alfredu Bumblausku: „K? reik?t? žinoti apie Ukrainos istorij??“
---
===== 2022-03-04 13:20:21 =====
Vilnie?iams vaikams – didesnis neformaliojo švietimo krepšelis
---
Premjer?: sud?tingiausiose situacijoje mokslo ir mokslinink? svarba tik did?ja
---
Prasideda pri?mimas ? sostin?s mokyklas: k? svarbu žinoti?
---
Dešimtokai lietuvi? kalbos ir matematikos pasiekimus geguž? tikrinsis nuotoliniu b?du
---
Vilniuje prasideda pri?mimas ? mokyklas
---
===== 2022-03-01 07:09:05 =====
Nuotolin? istorijos pamoka apie Ukrain? sulauk? 30 t?kst. perži?r?
---
J.Šiugždinien?: po Ukrainos pergal?s bendradarbiavimas su šia herojiška valstybe tik did?s
---
Vilniaus savivaldyb? svarsto ?kurdinti moksleivius buvusiame „Ignitis“ pastate
---
Socialdemokratai ragina stabdyti švietimo ?staig? tinklo pertvark?
---
Poky?iai mokyklin?je literat?ros programoje: mažiau privalom? autori?, brandos egzaminas – iš keli? dali?
---
===== 2022-02-26 11:04:29 =====
Mokytojo Gy?io „pagalbos“ – žygis, puodas ir uodas
---
Nuo kovo 2-osios pradinukams klas?se nebereik?s d?v?ti kauki?
---
Dr. Aust?ja Landsbergien?: Matematikos nerimas – kas tai ir ar ?manoma išvengti?
---
Ukrainos palaikymui – visuotin? istorijos pamoka Lietuvos mokykloms
---
Mokinius kvie?ia didžiausias chemijos dalyko konkursas Lietuvoje
---
===== 2022-02-23 10:11:14 =====
Mokykl? tinklo stiprinimas savivaldyb?se: klausimai ir atsakymai
---
Vaiko ir paauglio kelias ? s?km?, arba Kaip gauti Nobelio premij?
---
Geriausias ugdymas – žygis, laužas, puodas ir uodas
---
Vilija Targamadz?: Bendrojo ugdymo mokykl? reformatoriai, ar ir toliau s?site kakofonij??
---
Švietimo ministr?: tai, kad turime sujungtas 5–8 klases, yra kažkas baisaus
---

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1