'Python parsing the site gives <html></html>
There is a website that I need to analyze
However, when I try to analyze it, I get the response <html></html>
Tried to change the useragent, cookie, doesn't help.
from bs4 import BeautifulSoup
import httpx
response = httpx.get('https://lolz.guru/market/')
soup = BeautifulSoup(response.text, 'lxml')
print(response.text)
Solution 1:[1]
If that site requires a real browser, you could try to direct a real browser to retrieve the page and the data. Selenium is a tool intended to test web applications, but in essence it can run scripts imitating user interaction with web browsers so the applications get checked.
There are nice tutorials out there, also for using Selenium from Python.
It also supports cookies: https://www.selenium.dev/documentation/webdriver/browser/cookies/
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.example.com")
# Adds the cookie into current browser context
driver.add_cookie({"name": "key", "value": "value"})
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
