'Pandas to_sql() slow on one DataFrame but fast on others
Goal
I'm trying to use pandas DataFrame.to_sql() to send a large DataFrame (>1M rows) to an MS SQL server database.
Problem
The command is significantly slower on one particular DataFrame, taking about 130 sec to send 10,000 rows. In contrast, a similar DataFrame takes just 7 sec to send the same number of rows. The latter DataFrame actually has more columns, and more data as measured by df.memory_usage(deep=True).
Details
The SQLAlchemy engine is created via
engine = create_engine('mssql+pyodbc://@<server>/<db>?driver=ODBC+Driver+17+for+SQL+Server', fast_executemany=True)
The to_sql() call is as follows:
df[i:i+chunksize].to_sql(table, conn, index=False, if_exists='replace')
where chunksize = 10000.
I've attempted to locate the bottleneck via cProfile, but this only revealed that nearly all of the time is spent in pyodbc.Cursor.executemany.
Any tips for debugging would be appreciated!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
