'Export 100 millions rows from teradata to snowflake using python

What is the approach to export hundreds of millions of rows from teradata to snowflake? I am using the approach of to_csv() however, it's taking more then 2 hrs to load the data from dataframe to csv. Any faster/better approach the can be used to improve the performance and increase efficiency. Also, using to_csv() data seems to be corrupted in some place. Any solution to that?

Code -

query = "SELECT * FROM " + table_name
df = pd.read_sql(query, connect) 
df.to_csv(path, index=False, encoding='utf-8')


Solution 1:[1]

Got the solution by using pyspark approach.

Sol1 : For Teradata above 16.00, only one jar is need to connect pyspark APIs with the code and establish connection. Once done, you can use the repartition() to divide the dataframe in batches and export it into csv() as per the requirement.

Sol2 : Teradata provides data streaming option using TPT directly into AWS,Azure storage utility. For dumping data into AWS S3 bucket, you can check the given link to understand the data connectors and related questions. Link - https://docs.teradata.com/r/p~0sSD4zl4K8YPbEGnM3Rg/SRIygNBx15FwMJ333zg_HA

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 starlord