'Pandas to_excel( ) output float point is not right

I have some a weird situation trying to get the output in to_excel pandas function.

I tried to read a xlsx excel file with the number "21631706.9893399", but when a try to write in a new xlsx excel file gives me the output "21631706.98934". I tested with openpyxl and xlsxwriter but I got the same results, however if I tried to write a xls excel with xlwt engine it gives me the right answer, is there a way to read and write the same float number with these engines? I suspect there some float limitations in xlsx file, but in the xlsx input the number is fine.

My code:

import pandas as pd

df = pd.read_excel(r'input\sample.xlsx')

df.to_excel(r'output\excel xlsx - xlsxwriter.xlsx', engine='xlsxwriter', index=False)
df.to_excel(r'output\excel xlsx - openpyxl.xlsx', engine='openpyxl', index=False)
df.to_excel(r'output\excel xls - xlwt.xls', engine='xlwt', index=False)

Github repository with the samples

Dependencies:

python = "3.7.4"
pandas = "1.3.5"
XlsxWriter = "3.0.3"
openpyxl = "3.0.9"
xlwt = "1.3.0"


Solution 1:[1]

Set the format of the float as per your requirement

format = workbook.add_format({'num_format':'0.0000000'}]

For more details: https://xlsxwriter.readthedocs.io/example_pandas_column_formats.html

Solution 2:[2]

The issue that you are seeing is just a symptom of how floating point numbers behave, or more specifically how IEEE754 floating point numbers behave.

Excel and Python (without higher precision libraries) both use IEEE754 "double" floating point numbers which have a general precision of 15 digits. The number in your test case has more than 15 digits (not counting the decimal place) and as a result it gets rounded to a display or storage precision when it is read or written.

For example the number stored in the input file is actually 21631706.989339948:

$ unzip input/sample.xlsx -d input_file
...

$ xmllint --format input_file/xl/worksheets/sheet1.xml | grep -C 2 216
    <row r="2" spans="1:1" x14ac:dyDescent="0.25">
      <c r="A2">
        <v>21631706.989339948</v>
      </c>
    </row>

This number has 17 digits so it cannot be represented without loss of precision in a IEE754 float. Excel reads and displays this as number 21631706.9893399 (as you say above).

Writing this number back out as a text representation is also subject to a loss of precision in the 15+ digits which is what happens:

$ unzip output/excel\ xlsx\ -\ xlsxwriter.xlsx -d xlsxwriter_output
...

$ xmllint --format xlsxwriter_output/xl/worksheets/sheet1.xml | grep -C 2 216
    <row r="2" spans="1:1">
      <c r="A2">
        <v>21631706.98933995</v>
      </c>
    </row>

The xls file behaves differently because it is a binary format and the 64bits that Python has in memory for the IEEE754 double is read/written in same way from/to the file format. This means that it appears more consistent but it doesn't mean that it is more precise since the underlying representation of the double is the same.

So in summary, this behaviour is a function of handling floating point numbers beyond the range of a IEE754 double's precision, and in particular when reading and writing them from/to text files.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Deepak
Solution 2 jmcnamara