'Pandas not assuming dtypes when using read_sql?
I have a table in sql I'm looking to read into a pandas dataframe. I can read the table in but all column dtypes are being read in as objects. When I write the table to a csv then re-read it back in using read_csv, the correct data types are assumed. Obviously this intermediate step is inefficient and I just want to be able to read the data directly from sql with the correct data types assumed.
I have 650 columns in the df so obviously manually specifying the data types is not possible.
Solution 1:[1]
So it turns out all the data types in the database are defined as varchar.
It seems read_sql reads the schema and assumes data types based off this. What's strange is then I couldn't convert those data types using infer_objects().
The only way to do it was to write to a csv then read than csv using pd.read_csv().
Solution 2:[2]
No, doesn't really check on metadata.
Per pandas doc data types are inferred by default, while others enabled on demand (extra guidance is needed for date formats, for example).
Thus bare pd.read_sql is not fully robust, but may work on your specific data.
On my Postgres this looks like
column_name postgres pandas
patient_id character varying object
spell_id character varying object
spell_start_date date object
spell_start_time time without time zone object
spell_end_date date object
spell_end_time time without time zone object
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tom Smith |
| Solution 2 | Maciej S. |
