'How to preserve dtype int when reading integers with NaN in pandas

The question might sound silly but I am interested to know the following:

my_df = pd.read_sql_query(sql_script.read(), engine).astype(object) 

functions with the pandas version 1.0.5 and does not allow for the NaNs in the integer column to be turned to floats whereas on pandas version 1.3.5 .astype(object) does absolutely nothing.

I am curious to know why is this and of course what is the best approach for keeping the data obtained from sql as is without converting it to floats (where columns have NaNs because NaNs are floats).

Thank you in advance!



Solution 1:[1]

Use dtype 'Int64' for NaN support

  • 'Int64' (capital I) is a pandas nullable integer, so it can mix with NaNs.
  • Default numpy integers cannot mix with NaNs, so the column will become dtype object.

For example, say column Col D contains only integers and NaNs:

  • Either use the dtype param at load time (available in most read_* methods):

    df = pd.read_sql_query(sql_script.read(), engine, dtype={'Col D': 'Int64'})
    #                                                 ^^^^^
    
  • Or use astype after loading:

    df = pd.read_sql_query(sql_script.read(), engine).astype({'Col D': 'Int64'})
    #                                                 ^^^^^^
    

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1