'How to stop Pandas converting integer to decimal when reading in an .xlsx file?
I have an .xlsx file that I am loading into a dataframe using the pd.read_excel method. However, when I do so, one of my columns appears to change format, with pandas adding a decimal point. Does anyone know why this is happening and how to stop it please?
Example of data in the .xlsx file:
191001
191002
191003
Example of the same data in the dataframe:
191001.0
191002.0
191003.0
The relevant column is using the 'General' format option in Excel.
I tried removing the decimal point with the following method; however I got the error message "pandas.errors.IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer".
df.column1 = df.column1.astype(int)
Any help would be appreciated!
Solution 1:[1]
Your file most likely has infinite and nan values within the column.
You will need to remove them first
import numpy as np
df.replace([np.inf, -np.inf], np.nan, inplace=True)
df.fillna(0, inplace = True)
df.column1 = df.column1.astype(int)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ArchAngelPwn |
