'How can I get the content of an url and write into new file using HTMLSession in Python?
In beautifulsoup, we use response.content to render the text of the URL and create new file. What should we write if we use HTMLSession from requests_html instead of beautifulsoup?
For example,
import requests
from urllib.parse import urlparse
from requests_html import HTMLSession
session = HTMLSession()
# Specify the DOI here
URL="https://academic.oup.com/qje/article/126/4/1593/17089543/qjr041.pdf"
r = session.get(URL,allow_redirects=True)
with open(pdf_title, "wb") as new_pdf:
print(f"Begin writing to {pdf_title}")
new_pdf.write(r.html.content) # This line is not working
Solution 1:[1]
This is all you need, although when I do this, I get "request forbidden by administrative rules". Presumably, you have the key to get past this.
import requests
pdf_title = "xyz.pdf"
URL="https://academic.oup.com/qje/article/126/4/1593/17089543/qjr041.pdf"
r = requests.get(URL,allow_redirects=True)
with open(pdf_title, "wb") as new_pdf:
new_pdf.write(r.content)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tim Roberts |
