'Python: How to get the Content-Type of an URL?
I need to get the content-type of an internet(intranet) resource not a local file. How can I get the MIME type from a resource behind an URL:
I tried this:
res = urllib.urlopen("http://www.iana.org/assignments/language-subtag-registry")
http_message = res.info()
message = http_message.getplist()
I get:
['charset=UTF-8']
How can I get the Content-Type, can be done using urllib and how or if not what is the other way?
Solution 1:[1]
A Python3 solution to this:
import urllib.request
with urllib.request.urlopen('http://www.google.com') as response:
info = response.info()
print(info.get_content_type()) # -> text/html
print(info.get_content_maintype()) # -> text
print(info.get_content_subtype()) # -> html
Solution 2:[2]
Update: since info() function is depreciated in Python 3.9, you can read about the preferred type called headers here
import urllib
r = urllib.request.urlopen(url)
header = r.headers # type is email.message.EmailMessage
contentType = header.get_content_type() # or header.get('content-type')
contentLength = header.get('content-length')
filename = header.get_filename()
also, a good way to quickly get the mimetype without actually loading the url
import mimetypes
contentType, encoding = mimetypes.guess_type(url)
The second method does not guarantee an answer but is a quick and dirty trick since it's just looking at the URL string rather than actually opening the URL.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | amzshow |
