'TypeError - Attempting to SQL Insert : sequence item 0: expected str instance, dict found

I have a fairly large pandas DataFrame that I am attempting to INSERT into a MySQL table.

import pandas as pd 

# mysql connection
import pymysql
from sqlalchemy import create_engine
user = 'user'
pwd = 'xxxxx'
host =  'x.com'
port = 3306
database = 'dbname'

engine = create_engine("mysql+pymysql://{}:{}@{}/{}".format(user,pwd,host,database))


# column dtypes
[
  dtype('O'),
  dtype('O'),
  dtype('O'),
  dtype('int64'),
  dtype('float64'),
  dtype('float64'),
  dtype('O'),
  dtype('O')
]

Here are the steps I take for dict to str:

# Find columns containing dicts
col_dict = [i for i in df.columns if isinstance(df[i][0],dict)]
print(col_dict)
[] <-- empty list

I checked the datatype of columns itself, not the data and it's all <class 'str'>.

col_type = [type(i) for i in df.columns] print(col_dict)

# Dict to string
df[col_dict] = df[col_dict].astype(str)

# Insert into SQL
con = engine.connect()
df.to_sql(name="df_table", con=con, if_exists = 'replace')
con.close()

Traceback:

/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pymysql/converters.py in escape_item(val, charset, mapping)
     21 
     22     if encoder in (escape_dict, escape_sequence):
---> 23         val = encoder(val, charset, mapping)
     24     else:
     25         val = encoder(val, mapping)

/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pymysql/converters.py in escape_dict(val, charset, mapping)
     30     n = {}
     31     for k, v in val.items():
---> 32         quoted = escape_item(v, charset, mapping)
     33         n[k] = quoted
     34     return n

/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pymysql/converters.py in escape_item(val, charset, mapping)
     21 
     22     if encoder in (escape_dict, escape_sequence):
---> 23         val = encoder(val, charset, mapping)
     24     else:
     25         val = encoder(val, mapping)

/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/pymysql/converters.py in escape_sequence(val, charset, mapping)
     40         quoted = escape_item(item, charset, mapping)
     41         n.append(quoted)
---> 42     return "(" + ",".join(n) + ")"
     43 
     44 

TypeError: sequence item 0: expected str instance, dict found


Solution 1:[1]

As recommended by pandas authors, consider using the StringDtype (introduced in Pandas 1.0.0) to store text data by converting with astype("string") and not object type derived from astype("str") which the latter type can store a mix of anything and not just strings. Consequently, your column of dictionaries remained dict types and hence the incompatibility database error.

To demonstrate, a column of dicts render its cell values as dict types even though the Series is an object dtype:

list_of_dicts = [
    {"test1": [1,2,3], "test2": ['a', 'b', 'c']},
    {"example1": [7, 8, 9], "example2": ['x', 'y', 'z']}
]

df = pd.DataFrame({"col_dict": list_of_dicts})
df["col_dict"]
# 0       {'test1': [1, 2, 3], 'test2': ['a', 'b', 'c']}
# 1    {'example1': [7, 8, 9], 'example2': ['x', 'y',...
# Name: col_dict, dtype: object

type(df["col_dict"][0])
# dict
isinstance(df["col_dict"][0], dict)
# True

However, converting to StringDType, column values render as str types. Note: the abbreviated str is not the same as "string".

df = pd.DataFrame({"col_dict": pd.Series(list_of_dicts, dtype="string")})
df["col_dict"]
# 0       {'test1': [1, 2, 3], 'test2': ['a', 'b', 'c']}
# 1    {'example1': [7, 8, 9], 'example2': ['x', 'y',...
# Name: col_dict, dtype: string

type(df["col_dict"][0])
# str
isinstance(df["col_dict"][0], dict)
# False

Below are the multiple ways to cast, map, or parse column of dicts to StringDType:

df[col_dict] = df[col_dict].astype("string")
df = pd.DataFrame({"col_dict": pd.Series([list_of_dicts], dtype="string")})
df = pd.read_csv("input.csv", dtype={"col_dict": "string"})

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Parfait