'Other than headers and cookies, what else does a web scrapper needs to replicate a request?
I'm trying to download a .pdf off a website. I'm using the python's requests module. I know that the request should have the following characteristics:
method: GET
link: https://compranet.hacienda.gob.mx/esop/toolkit/DownloadProxy/55335129verify=13&oid=56061213
two url params:
verify = 13 oid = 56061213
and the session cookies: JSESSIONID, VISITOR_ET, and VISITORID (which I retrieve using a Firefox driver, also with Python)
Even though I've replicated every single header & cookie (inspected with the firefox developer tools), I'm still getting a 400 Bad Request code as a response.
Is there any other parameter I should pay attention to ? As an extra, I'm very confident that the site uses the Dojo toolkit.
If you want to try it yourself you can...
get your session cookies here: https://compranet.hacienda.gob.mx/esop/toolkit/opportunity/past/list.si?reset=true&resetstored=true&userAct=changeLangIndex&language=es_MX&_ncp=1648087723809.6014-1
the sub-site where you would click the download link: https://compranet.hacienda.gob.mx/esop/toolkit/opportunity/past/2072615/detail.si
The actual download link is above, but I paste it here again: https://compranet.hacienda.gob.mx/esop/toolkit/DownloadProxy/55335129?verify=13&oid=56061213
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
