'Using request in python to download a xls file
In this page you will find a link to download an xls file (below attachment or adjuntos): https://www.banrep.gov.co/es/emisiones-vigentes-el-dcv
The link to download the xls file is: https://www.banrep.gov.co/sites/default/files/paginas/emisiones/EMISIONES.xls
I was using this code to automatically download that file:
import requests
import os
path = os.path.abspath(os.getcwd()) #donde se descargará el archivo
path = path.replace("\\", '/')+'/'
url = 'https://www.banrep.gov.co/sites/default/files/paginas/emisiones/EMISIONES.xls'
myfile = requests.get(url, verify=False)
open(path+'EMISIONES.xls', 'wb').write(myfile.content)
This code was working well, but suddently the downloaded file started being corrupted.
If I run the code, it raises this warning:
InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.banrep.gov.co'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
Solution 1:[1]
The error is related to how your request is being built. The status_code returned by the request is 403 [Forbiden]. You can see it typing
myfile.status_code
I guess the security issue is related to cookies and headers in your get request, because of that I suggest you take a view on how the webpage is building its headers in your request before the URL you're using is sent.
TIP: start you web browser in development mode and using Network tab, try to identify the headers.
To solve the issue of cookies take a view on how to retrieve naturally cookies pointing out to a previous webpage in www.banrep.gov.co, using requests.sessions
session_ = requests.Session()
Before coding you could try to test your requests using Postman, or other REST API test software.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tomerikoo |
