'Using request in python to download a xls file

In this page you will find a link to download an xls file (below attachment or adjuntos): https://www.banrep.gov.co/es/emisiones-vigentes-el-dcv

The link to download the xls file is: https://www.banrep.gov.co/sites/default/files/paginas/emisiones/EMISIONES.xls

I was using this code to automatically download that file:

import requests
import os

path = os.path.abspath(os.getcwd()) #donde se descargará el archivo

path = path.replace("\\", '/')+'/'

url = 'https://www.banrep.gov.co/sites/default/files/paginas/emisiones/EMISIONES.xls'

myfile = requests.get(url, verify=False)

open(path+'EMISIONES.xls', 'wb').write(myfile.content)

This code was working well, but suddently the downloaded file started being corrupted.

If I run the code, it raises this warning:

InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.banrep.gov.co'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(


Solution 1:[1]

The error is related to how your request is being built. The status_code returned by the request is 403 [Forbiden]. You can see it typing

myfile.status_code

I guess the security issue is related to cookies and headers in your get request, because of that I suggest you take a view on how the webpage is building its headers in your request before the URL you're using is sent.

TIP: start you web browser in development mode and using Network tab, try to identify the headers.

To solve the issue of cookies take a view on how to retrieve naturally cookies pointing out to a previous webpage in www.banrep.gov.co, using requests.sessions

session_ = requests.Session()

Before coding you could try to test your requests using Postman, or other REST API test software.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tomerikoo