'Limit writing of pandas to_excel to 1 million rows per sheet

I have a dataFrame with around 28 millions rows (5 columns) and I'm struggling to write that to an excel, which is limited to 1,048,576 rows, I can't have that in more than one workbook so I'll need to split thoes 28Mi into 28 sheets and so on.

this is what I'm doing with it:

writer = pd.ExcelWriter('NAME_XX-' + mes +'-' + ano + '_XXX.xlsx', engine = "xlsxwriter")

notMor.to_excel(writer, engine='xlsxwriter', index=False)

and getting this error: enter image description here

I tought of splitting the dataframe into 27 different ones and then saving each one on the workbook but there isn't a simpler way?



Solution 1:[1]

  1. In Excel 2016, and Excel for Microsoft 365, use Data > Get & Transform Data > Get Data to import data from any number of external data sources, such as a text file, Excel workbook, website, Microsoft Access, SQL Server, or another relational database that contains multiple related tables.
    • In Excel 2013 and 2010, go to Power Query > Get External Data, and select your data source.
  2. Excel prompts you to select a table. If you want to get multiple tables from the same data source, check the Enable selection of multiple tables option. When you select multiple tables, Excel automatically creates a Data Model for you.
  3. Select one or more tables, then click Load.

enter image description here

  • This example has 3M rows

enter image description here

Solution 2:[2]

simply / easy / fast --> download your data to a CSV split by 1000000 from Pandas and then have Excel open it. Excel will adjust and open across multiple sheets accordingly.

Some code you can tweak:

#split file code
lines_per_file = 1000000
smallfile = None
with open('Plan.txt') as bigfile:
    for lineno, line in enumerate(bigfile):
        if lineno % lines_per_file == 0:
            if smallfile:
                smallfile.close()
            small_filename = 'small_file_{}.txt'.format(lineno + lines_per_file)
            smallfile = open(small_filename, "w")
        smallfile.write(line)
    if smallfile:
        smallfile.close()

Solution 3:[3]

I had the same problem and I had implemented the following. It's my pleasure if this helps you at all.

rows_per_sheet = 100000
number_of_sheets = floor((len(data)/rows_per_sheet))+1
start_index=0
end_index = rows_per_sheet
writer = pd.ExcelWriter(filename)
for i in range(number_of_sheets):
    df = pd.DataFrame(list(data[start_index:end_index]), columns=columns)
    df.to_excel(writer, index=False, sheet_name='sheet_'+str(i))
    start_index = end_index
    end_index = end_index + rows_per_sheet
    

writer.save()

However, to create multiple sheets on a same file would cost you a lot of time as every time it needs to load the original file which gets incremented every time. In this case you can create multiple excel file instead of multiple sheets-

rows_per_file = 1000000
number_of_files = floor((len(data)/rows_per_file))+1
start_index=0
end_index = rows_per_file
df = pd.DataFrame(list(data), columns=columns)
for i in range(number_of_files):
    filepart = 'file' + '_'+ str(i) + '.xlsx'
    writer = pd.ExcelWriter(filepart)
    df_mod = df.iloc[start_index:end_index]
    df_mod.to_excel(writer, index=False, sheet_name='sheet')
    start_index = end_index
    end_index = end_index + rows_per_file
    writer.save()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3