'Why getting different data in browser developer tools vs BeautifulSoap / Postman?
I want to scrap data from this web page
I want to get all the blogs...which are under result tag (<div class="results">)
In browser tools there it is showing under result tag there are 10 snippets...

But using Beautifulsoap I am getting
<div class="results">
</div>
and in postman getting same thing..
This is the way I am doing..
topicuri = "\"
r = s.get(topicuri)
soup = BeautifulSoup(r.text, 'html.parser')
pages = soup.find('div', {'class': 'results'})
print(pages)
Solution 1:[1]
You also can get data from api calls json response
import requests
import json
body= "vodafone"
headers= {
'content-type': 'application/json'
}
api_url = "https://search.donanimhaber.com/api/search/portal/?q=vodafone&p=3&devicetype=browsermobile&order=date_desc&in=all&contenttype=all&wordtype=both&daterange=all"
jsonData = requests.post(api_url, data=json.dumps(body), headers=headers).json()
for item in jsonData['contents']:
categoryName=item['categoryName']
print(categoryName)
Output:
Operatörler - Kurumsal Haberler
Operatörler - Kurumsal Haberler
Operatörler - Kurumsal Haberler
Mobil Aksesuarlar
Operatörler - Kurumsal Haberler
Kripto Para
Sinema ve Dizi
Mobil Oyunlar
Operatörler - Kurumsal Haberler
Operatörler - Kurumsal Haberler
Solution 2:[2]
The website is using Javascript to display the snippets. BeautifulSoup does not execute Javascript, while the browser does. You will probably want to use the Chromium engine in Python in order to web-scrape Javascript-based content.
Solution 3:[3]
As mentioned requests could not render JavaScript but there are two alternatives:
- Use
requestsand perform a post request on your url - Use
seleniumto get the renderedpage_sourceas you would expect it.
Example
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
url = 'https://search.donanimhaber.com/portal?q=vodafone&p=3&devicetype=browsermobile&order=date_desc&in=all&contenttype=all&wordtype=both&range=all'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
wait = WebDriverWait(driver, 10)
driver.get(url)
wait.until(EC.presence_of_all_elements_located((By.XPATH, './/div[@class="results"]/div[@class="snippet"]')))
content = driver.page_source
soup = BeautifulSoup(content,"html.parser")
pages = soup.find_all('div', {'class': 'snippet'})
for p in pages:
print(p.h2.text.strip())
Output
Vodafone'dan dijital sa?l?k projelerine ücretsiz 5G deste?i
Vodafone'un son 15 y?lda Türkiye ekonomisine katk?s? aç?kland?
"Yar?n? Kodlayanlar" projesinde gençler afet sorunlar?na çözümler üretti
Küresel ak?ll? saat pazar? y?l?n ilk çeyre?inde yüzde 35 büyüdü
Vodafone Türkiye'nin ilk çeyrek sonuçlar? aç?kland?: Servis gelirlerinde yüzde 19 art??
Netflix'e yeni eklenen dizi ve filmleri takip edebilece?iniz site
Sony ve SinemaTV anla?t?! Spider-Man, Venom 2 ve daha fazlas? TV'de ilk kez SinemaTV'de yay?nlanacak
Vodafone ve Riot Games, Türkiye'nin ilk 5G Wild Rift turnuvas?n? duyurdu
Türkiye'de kaç ki?i numara ta??ma ile operatör de?i?tirdi?
Turkcell'in Ramazan'a özel Salla Kazan kampanyas? ba?lad?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Lau |
| Solution 3 | HedgeHog |
