'Data migration from MySQL to SQL Server is taking huge time using pandas library

I need to migrate all our historical data from MySQL to SQL server. Data size is more then 50 GB.

I have created a script for migrate those data from MySQL to SQL server. using python Pandas library. Main reason for choosing pandas library is I am adding some cleaning process before migration.

def _insert_data_with_dataframe(self, df):

    if len(df) > 0:

        chunk_size = 5000

        for i in range(len(df) // chunk_size):
            df.head(n=chunk_size).to_sql('logs', self.engine_sql_staging, if_exists='append', index=False)

            df = df.iloc[chunk_size:]

            if len(logs) < chunk_size:
              logs.to_sql('logs', self.engine_sql_staging, if_exists='append', index=False)
              logs.iloc[chunk_size:]

I am using pandas df.to_sql function processing time is huge slow. To optimize this I insert those data chunk wise. But still data processing time is huge.

When I run my script in my local machine it takes 35 minutes to process 1 million rows.

Same script when I am running in a AWS server (I am using AWS SQL Server RDS) taking more then half an hour for only 50 thousands rows.

I have checked AWS console and found instance RAM and CPU are normal. So my question is what is taking too much time for the same query in AWS server.

My SQL server version is 2016.



Solution 1:[1]

I am quite sure this is gonna be faster if you try:

def _insert_data_with_dataframe(self, df):

    chunk_size = 5000

    df.to_sql('logs', self.engine_sql_staging, if_exists='append', index=False, chunksize=chunk_size)

Consult documentation on to_sql: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html

Solution 2:[2]

Use fast_executemany=True option in your connection engine

For example, use it as:

engine = create_engine(
            'mssql+pyodbc://{0}/{1}?trusted_connection=yes&driver=SQL+Server+Native+Client+11.0' \
             .format(server_name, db_name),
            fast_executemany=True
        )

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 karen
Solution 2 Satyam Dahiwal