'How can I download a file in Python3 with urlopen() or add custom headers to urlretrieve()?
tl;dr I want do download a file from a server who only allows certain User-Agents. I managed to get a 200 OK from the site by using following code:
opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent', 'Interwebs Exploiter 4')]
opener.open(url)
Since the file can be a .pdf or .zip or another format, I want to download it without parsing or reading it. Urlretrieve() seems like a good idea but it uses the default header, which makes the server return a 403 Forbidden.
How can I either download the file by using that custom built opener or simply add headers to urlretrieve()?
And this example in the Python Docs is complete gibberish to me.
Solution 1:[1]
I would use requests for that:
import requests
headers = {'User-Agent': 'Interwebs Exploiter 4'}
r = requests.get(url, allow_redirects=True, headers=headers)
with open(filename, 'wb') as f:
for chunk in r.iter_content(1024):
f.write(chunk)
Unless it's absolutely essential for some reason to use urllib
Solution 2:[2]
Download an URL with urllib.request:
opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent', 'Interwebs Exploiter 4')]
with opener.open(url) as url_file:
url_content = url_file.read()
Do note that url_file.read() will read the entire file into memory, which might not be what you want if it could be a very large file.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | telex-wap |
| Solution 2 | David Foster |
