'TypeError - Attempting to SQL Insert : sequence item 0: expected str instance, dict found
I have a fairly large pandas DataFrame that I am attempting to INSERT into a MySQL table.
import pandas as pd
# mysql connection
import pymysql
from sqlalchemy import create_engine
user = 'user'
pwd = 'xxxxx'
host = 'x.com'
port = 3306
database = 'dbname'
engine = create_engine("mysql+pymysql://{}:{}@{}/{}".format(user,pwd,host,database))
# column dtypes
[
dtype('O'),
dtype('O'),
dtype('O'),
dtype('int64'),
dtype('float64'),
dtype('float64'),
dtype('O'),
dtype('O')
]
Here are the steps I take for dict to str:
# Find columns containing dicts
col_dict = [i for i in df.columns if isinstance(df[i][0],dict)]
print(col_dict)
[] <-- empty list
I checked the datatype of columns itself, not the data and it's all <class 'str'>.
col_type = [type(i) for i in df.columns] print(col_dict)
# Dict to string
df[col_dict] = df[col_dict].astype(str)
# Insert into SQL
con = engine.connect()
df.to_sql(name="df_table", con=con, if_exists = 'replace')
con.close()
Traceback:
/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pymysql/converters.py in escape_item(val, charset, mapping)
21
22 if encoder in (escape_dict, escape_sequence):
---> 23 val = encoder(val, charset, mapping)
24 else:
25 val = encoder(val, mapping)
/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pymysql/converters.py in escape_dict(val, charset, mapping)
30 n = {}
31 for k, v in val.items():
---> 32 quoted = escape_item(v, charset, mapping)
33 n[k] = quoted
34 return n
/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pymysql/converters.py in escape_item(val, charset, mapping)
21
22 if encoder in (escape_dict, escape_sequence):
---> 23 val = encoder(val, charset, mapping)
24 else:
25 val = encoder(val, mapping)
/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pymysql/converters.py in escape_sequence(val, charset, mapping)
40 quoted = escape_item(item, charset, mapping)
41 n.append(quoted)
---> 42 return "(" + ",".join(n) + ")"
43
44
TypeError: sequence item 0: expected str instance, dict found
Solution 1:[1]
As recommended by pandas authors, consider using the StringDtype (introduced in Pandas 1.0.0) to store text data by converting with astype("string") and not object type derived from astype("str") which the latter type can store a mix of anything and not just strings. Consequently, your column of dictionaries remained dict types and hence the incompatibility database error.
To demonstrate, a column of dicts render its cell values as dict types even though the Series is an object dtype:
list_of_dicts = [
{"test1": [1,2,3], "test2": ['a', 'b', 'c']},
{"example1": [7, 8, 9], "example2": ['x', 'y', 'z']}
]
df = pd.DataFrame({"col_dict": list_of_dicts})
df["col_dict"]
# 0 {'test1': [1, 2, 3], 'test2': ['a', 'b', 'c']}
# 1 {'example1': [7, 8, 9], 'example2': ['x', 'y',...
# Name: col_dict, dtype: object
type(df["col_dict"][0])
# dict
isinstance(df["col_dict"][0], dict)
# True
However, converting to StringDType, column values render as str types. Note: the abbreviated str is not the same as "string".
df = pd.DataFrame({"col_dict": pd.Series(list_of_dicts, dtype="string")})
df["col_dict"]
# 0 {'test1': [1, 2, 3], 'test2': ['a', 'b', 'c']}
# 1 {'example1': [7, 8, 9], 'example2': ['x', 'y',...
# Name: col_dict, dtype: string
type(df["col_dict"][0])
# str
isinstance(df["col_dict"][0], dict)
# False
Below are the multiple ways to cast, map, or parse column of dicts to StringDType:
df[col_dict] = df[col_dict].astype("string")
df = pd.DataFrame({"col_dict": pd.Series([list_of_dicts], dtype="string")})
df = pd.read_csv("input.csv", dtype={"col_dict": "string"})
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Parfait |
