'Pandas not assuming dtypes when using read_sql?

I have a table in sql I'm looking to read into a pandas dataframe. I can read the table in but all column dtypes are being read in as objects. When I write the table to a csv then re-read it back in using read_csv, the correct data types are assumed. Obviously this intermediate step is inefficient and I just want to be able to read the data directly from sql with the correct data types assumed.

I have 650 columns in the df so obviously manually specifying the data types is not possible.



Solution 1:[1]

So it turns out all the data types in the database are defined as varchar.

It seems read_sql reads the schema and assumes data types based off this. What's strange is then I couldn't convert those data types using infer_objects().

The only way to do it was to write to a csv then read than csv using pd.read_csv().

Solution 2:[2]

No, doesn't really check on metadata.

Per pandas doc data types are inferred by default, while others enabled on demand (extra guidance is needed for date formats, for example).

Thus bare pd.read_sql is not fully robust, but may work on your specific data.

On my Postgres this looks like

column_name     postgres    pandas
patient_id  character varying   object
spell_id    character varying   object
spell_start_date    date    object
spell_start_time    time without time zone  object
spell_end_date  date    object
spell_end_time  time without time zone  object

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tom Smith
Solution 2 Maciej S.