'Python Requests returns 403 error when downloading a PDF file

I have been trying to download a PDF file using requests but, no matter what I do, it keeps returning 403 as status and it is impossible to open the downloaded PDF.

Here is the code I am running:

import requests   

url_pdf='https://www.agerborsamerci.it/wp-content/uploads/2022/01/Settimanale-n.-2-del-20-Gennaio-2022-%E2%80%93-Listino-Borsa-n.-2.pdf'
   
    #session = requests.Session()

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36",
        "Accept": "image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8",
        "Cache-Control": "image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8",
        "host-header": "6b7412fb82ca5edfd0917e3957f05d89",
        "Accept-Encoding": "gzip, deflate, br",
        "cache-control": "image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8",
        "Connection": "keep-alive",
        "referer":"https://www.agerborsamerci.it/wp-content/uploads/2022/01/Settimanale-n.-2-del-20-Gennaio-2022-%E2%80%93-Listino-Borsa-n.-2.pdf"
    }

req=requests.get(url_pdf,  headers=headers)
print(req.status_code)

with open("bologna.pdf", 'wb') as f:
  f.write(req.content)
f.closed

As you can see, I have tried using a 'Session' object, setting (different) 'User-Agent' as well as other headers but nothing seems to work.

I have also tried using

import os
name='bologna.pdf'    
os.system('wget {} -O {}'.format(url_pdf,name))

But it is not working either.

Do you have any idea about what could I do to overcome this problem? I am really struggling to figure it out.

Thank you a lot!



Solution 1:[1]

A 403 error means that you do not have permission to access the page.

Per the link above,

The HTTP 403 Forbidden response status code indicates that the server understands the request but refuses to authorize it.

I would recommend looking into figuring out what is the relevant permission needed to be on that site/page.

Solution 2:[2]

Avoid sending headers unless required, try anonymouse default first (they still get your IP details) and only takes 2 seconds to download:-

curl -o bologna.pdf  https://www.agerborsamerci.it/wp-content/uploads/2022/01/Settimanale-n.-2-del-20-Gennaio-2022-%E2%80%93-Listino-Borsa-n.-2.pdf

Works for my curl enhanced Windows 7 and should work naturally in win10 or 11

>curl -o bologna.pdf https://www.agerborsamerci.it/wp-content/uploads/2022/01/Settimanale-n.-2-del-20-Gennaio-2022-%E2%80%93-Listino-Borsa-n.-2.pdf
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  346k  100  346k    0     0   117k      0  0:00:02  0:00:02 --:--:--  117k

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Akib Rhast
Solution 2