'How to stop Pandas converting integer to decimal when reading in an .xlsx file?

I have an .xlsx file that I am loading into a dataframe using the pd.read_excel method. However, when I do so, one of my columns appears to change format, with pandas adding a decimal point. Does anyone know why this is happening and how to stop it please?

Example of data in the .xlsx file:

191001
191002
191003

Example of the same data in the dataframe:

191001.0
191002.0
191003.0

The relevant column is using the 'General' format option in Excel.

I tried removing the decimal point with the following method; however I got the error message "pandas.errors.IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer".

df.column1 = df.column1.astype(int)

Any help would be appreciated!



Solution 1:[1]

Your file most likely has infinite and nan values within the column.

You will need to remove them first

import numpy as np

df.replace([np.inf, -np.inf], np.nan, inplace=True)
df.fillna(0, inplace = True)
df.column1 = df.column1.astype(int)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ArchAngelPwn