'403 Forbidden Error when using Python beautifulsoup and Requests, Using all headers but still same
As the title above states I am getting a 403 error. The URLs generated are valid, I can print them and then open them in my browser just fine. I get the whole Request Headers and still 403 Forbidden, can someone help me to solve it ?
import requests
from bs4 import BeautifulSoup
header = {
"sec-ch-ua": '" Not A;Brand";v="99", "Chromium";v="99", "Microsoft Edge";v="99"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "Windows",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36 Edg/99.0.1150.30"
}
url="https://www.nadirkitap.com/"
get = requests.get(url,headers=header)
print(get.status_code)
Solution 1:[1]
Take a look in the response text - It tells site is protected by cloudflare and you will need to activate JavaScript. Cause requests do not deal with this, you could use selenium instead.
Example
Includes BeautifulSoup object based on driver.page_source and prints a list of book titles:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
service = Service('PATH TO YOUR CHROMEDRIVER')
driver = webdriver.Chrome(service=service)
driver.get('https://www.nadirkitap.com/')
soup = BeautifulSoup(driver.page_source)
print([t['title'] for t in soup.select('a[title]')])
Output
['?kinci el kitap, yeni kitap, dergi, efemera', '?kinci el kitap, yeni kitap, dergi, efemera', '?kinci el kitap ve yeni kitap', 'Bilim ve Teknik Kitaplar?', 'Çizgi Roman Kitaplar?', 'Çocuk Kitaplar?', 'Dini Kitaplar', 'Edebiyat Kitaplar?', 'Ekonomi ve ?? Dünyas? Kitaplar?', 'Felsefe Kitaplar?', 'Hukuk Kitaplar?', 'Osmanl?ca Kitaplar',...]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | HedgeHog |
