'Python Requests returns 403 error when downloading a PDF file

I have been trying to download a PDF file using requests but, no matter what I do, it keeps returning 403 as status and it is impossible to open the downloaded PDF.

Here is the code I am running:

import requests   

url_pdf='https://www.agerborsamerci.it/wp-content/uploads/2022/01/Settimanale-n.-2-del-20-Gennaio-2022-%E2%80%93-Listino-Borsa-n.-2.pdf'
   
    #session = requests.Session()

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36",
        "Accept": "image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8",
        "Cache-Control": "image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8",
        "host-header": "6b7412fb82ca5edfd0917e3957f05d89",
        "Accept-Encoding": "gzip, deflate, br",
        "cache-control": "image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8",
        "Connection": "keep-alive",
        "referer":"https://www.agerborsamerci.it/wp-content/uploads/2022/01/Settimanale-n.-2-del-20-Gennaio-2022-%E2%80%93-Listino-Borsa-n.-2.pdf"
    }

req=requests.get(url_pdf,  headers=headers)
print(req.status_code)

with open("bologna.pdf", 'wb') as f:
  f.write(req.content)
f.closed

As you can see, I have tried using a 'Session' object, setting (different) 'User-Agent' as well as other headers but nothing seems to work.

I have also tried using

import os
name='bologna.pdf'    
os.system('wget {} -O {}'.format(url_pdf,name))

But it is not working either.

Do you have any idea about what could I do to overcome this problem? I am really struggling to figure it out.

Thank you a lot!

python python-requests

Solution 1:^[1]

A 403 error means that you do not have permission to access the page.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403

Per the link above,

The HTTP 403 Forbidden response status code indicates that the server understands the request but refuses to authorize it.

I would recommend looking into figuring out what is the relevant permission needed to be on that site/page.

Solution 2:^[2]

Avoid sending headers unless required, try anonymouse default first (they still get your IP details) and only takes 2 seconds to download:-

curl -o bologna.pdf  https://www.agerborsamerci.it/wp-content/uploads/2022/01/Settimanale-n.-2-del-20-Gennaio-2022-%E2%80%93-Listino-Borsa-n.-2.pdf

Works for my curl enhanced Windows 7 and should work naturally in win10 or 11

>curl -o bologna.pdf https://www.agerborsamerci.it/wp-content/uploads/2022/01/Settimanale-n.-2-del-20-Gennaio-2022-%E2%80%93-Listino-Borsa-n.-2.pdf
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  346k  100  346k    0     0   117k      0  0:00:02  0:00:02 --:--:--  117k

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Akib Rhast
Solution 2

'Python Requests returns 403 error when downloading a PDF file

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]