'Downloading a CSV that requires authentication from an email link
I'm a very junior developer, tasked with automating the creation, download and transformation of a query from Stripe Sigma.
I've been able to get the bulk of my job done: I have daily scheduled queries that generate a report for the prior 24 hours, which is linked to a dummy account purely for those reports, and I've got the Transformation and reports done on the back half of this problem.
The roadblock I've run into though is getting this code to pull the csv that manually clicking the link generates.
import re
from imbox import Imbox # pip install imbox
import traceback
import requests
from bs4 import BeautifulSoup
mail = Imbox(host, username=username, password=password, ssl=True, ssl_context=None, starttls=False)
messages = mail.messages(unread=True)
message_list = []
for (uid, message) in messages:
body = str(message.body.get('html'))
message_list.append(body)
mail.logout()
def get_download_link(message):
print(message[0])
soup = BeautifulSoup(message, 'html.parser')
urls = []
for link in soup.find_all('a'):
print(link.get('href'))
urls.append(link.get('href'))
return urls[1]
# return urls
dl_urls = []
for m in message_list:
dl_urls.append(get_download_link(m))
for url in dl_urls: print(url) try: s = requests.Session() s.auth = (username, password) response = s.get(url, allow_redirects=True, auth= (username, password)) # print(response.headers) if (response.status_code == requests.codes.ok): print('response headers', response.headers['content-type']) response = requests.get(url, allow_redirects=True, auth= HTTPDigestAuth(username, password)) # print(response.text) print(response.content) # open(filename, 'wb').write(response.content) else: print("invalid status code",response.status_code) except: print('problem with url', url)
I'm working on this in jupyter notebooks, I've tried to just include relevant code detailing how I got into the email, how I extracted URLS from said email, and which one upon being clicked would download the csv.
All the way til the last step, I've had remarkably good luck, but now, the URL that I manually click downloads the csv as expected, however that same URL is being treated as the HTML for a stripe page by python/requests.
I've tried poking around in the headers, the one header that was suggested on another post ('Content-Disposition') wasn't present, and the printing the headers that are present takes up a good 20-25 lines.
Any suggestions on either headers that could contain the csv, or other approaches I would take would be appreciated.
I've included a (intentionally broken) URL to show the rough format of what is working for manual download, not working when kept in python entirely.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
