'pandas series containing bytes or string data to json or dictionary type

I have a pandas series with bytes datatype that I'd like to transform for manipulation and parsing contents.

import pandas as pd
from ast import literal_eval

df = pd.DataFrame({'id': [0],
                   'bdata': ["b'{\"status\":\"SuccessWithResult\",\"total\":13}"]
                 })

type(df['bdata'][0])

bytes

# Transform to dict
df_zillow_az_v2['attom'] = df_zillow_az_v2['attom'].apply(literal_eval)

ValueError: malformed node or string: b'

How do I convert pandas series of type bytes to either json or dict type?

  • The data may appear as str but it is actually stored as bytes in pandas DataFrame.


Solution 1:[1]

The values of bdata column are not bytes, they are strings as type(df['bdata'][0]) tells you. The b' is misleading. So you have to remove the characters b' from the string before applying literal_eval. You can do it using Series.str.strip

from ast import literal_eval
import pandas as pd

df = pd.DataFrame({'id': [0],
                   'bdata': ["b'{\"status\":\"SuccessWithResult\",\"total\":13}"]
                 })

df['bdata'] = df['bdata'].str.strip("b'").apply(literal_eval)

Output:

>>> df['bdata']

0    {'status': 'SuccessWithResult', 'total': 13}
Name: bdata, dtype: object

>>> df['bdata'].apply(type)

0    <class 'dict'>
Name: bdata, dtype: object

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rodalm