'Is there any way to improve to_sql performance in Python?

I have a loop that goes through each file (xlsx and csv) on a directory, reads them, append them and concatenate it in list. That list is them sent to a database. It works, but hen it comes to bigger files, it can take a lot of time. I figure this is the part where I need to improve my code:

        xls.to_sql(table, con=engine, if_exists='append', index=False)

I've been testing different chunksizes, but how can I choose the optimal chunksize when there's all kinds of files' sizes?

I've also been studying parallelization and was wondering if anyone could help me figure out how to improve this.

EDIT: By the way, I've also followed this blogpost to improve my connection to the database and all. And I've noticed a lot of perfomance improvements. But to_sql is still lacking performance.

https://towardsdatascience.com/dramatically-improve-your-database-inserts-with-a-simple-upgrade-6dfa672f1424



Solution 1:[1]

This is what we did in our project:

chunk_size = 2100 // len(df.columns) - 1

And used method = 'multi' in to_sql as well. This has worked out pretty well for us. method = 'multi' speeds things up quite a bit as it takes multiple insert statements at once.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 NYC Coder