'Downloading large files (10GB +) fails when Python Athena (Token Expired)

I am still learning Python (3.6) and now working on AWS. I am trying to automate a process where in the user is running a query in Athena. The results for the query are being directed to an S3 bucket. From the S3, I need to pull the file into my local and then run some more analysis using legacy tools. All this is being done step by step manually, by first firing a query in Athena Query Editor.

The problem I am facing is that the file(s) will be larger than 10GB and the SAML profile token expires after 1 hour. I have read some documentation about auto refreshing the credentials, however, while the file in being downloaded, how to even implement a solution like that. I have put my code below (that's the closest I got to a successful run with about 10000 records).

Any suggestions/help is appreciated.

import boto3
from boto3.s3.transfer import TransferConfig
import pandas as pd
import time

pd.set_option('display.max_rows', 50)
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 1000)

session=boto3.Session(profile_name='saml')
athena_client = session.client("athena")

query_response = athena_client.start_query_execution(
                                                    QueryString="SELECT * FROM TABLENAME WHERE=<condition>",
                                                    QueryExecutionContext={"Database": 'some_db'},
                                                    ResultConfiguration={
                                                        "OutputLocation": 's3://131653427868-heor-epi-workbench-results',
                                                        "EncryptionConfiguration": {"EncryptionOption": "SSE_S3"},
                                                    },
                                                    WorkGroup='myworkgroup'
                                                    )
print(query_response)

iteration = 30

temp_file_location: str = "C:\\Users\\<user>\\Desktop\\Python Projects\\tablename.csv"

while(iteration > 0):
    iteration = iteration - 1
    print(iteration)

    query_response_id = athena_client.get_query_execution(QueryExecutionId=query_response['QueryExecutionId'])
    print(query_response_id)

    if (query_response_id['QueryExecution']['Status']['State'] == 'FAILED') or (query_response_id['QueryExecution']['Status']['State'] == 'CANCELLED'):
        print("IF BLOCK: ", query_response_id['QueryExecution']['Status']['State'])
        print("The Query Failed.")

    elif (query_response_id['QueryExecution']['Status']['State'] == 'SUCCEEDED'):
        print("ELSE IF BLOCK: ", query_response_id['QueryExecution']['Status']['State'])
        print("Query Completed. Ready to download.")

        print("Proceeding to Download File......")

        config = TransferConfig(max_concurrency=5)

        s3_client = session.client("s3")
        s3_client.download_file('131653427868-heor-epi-workbench-results',
                                f"{query_response['QueryExecutionId']}.csv",
                                temp_file_location,
                                Config = config
                                )

        print("Download complete. Setting Iteration to 0 to exit loop. ")
        iteration = 0

    else:
        print("ELSE BLOCK: ", query_response_id['QueryExecution']['Status']['State'])
        print(query_response_id['QueryExecution']['Status']['State'])
        time.sleep(10)

pandasDF=pd.read_csv(temp_file_location)
print(pandasDF)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source