'How to efficiently convert List of List [dict] to pandas dataframe

I have a Request something like this :

[
   [
      {
         "name":"signal0",
         "string_value":"0.3705361587563305"
      },
      {
         "name":"signal1",
         "string_value":"12"
      },
      ...

   ]
]

datatype_mapper :

{
   "signal0":"float",
   "signal1":"int",
   "signal2":"str",
   "signal3":"bool"
}

I wanted to transform this request into pandas dataframe and change it's datatype based on the mapper to something like this

    signal0 signal1 
0   0.370536    12  

For now below solutions works both it is not efficient when this list scales to a bigger list.

Options 1: 1.17 ms ± 64.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

def option_1(new_req, dtype_map):
    column_names = list(items['name'] for items in new_req[0])
    values = list(tuple(i['string_value'] for i in item) for item in requests)
    df = pd.DataFrame(values, columns=column_names)
    dtypes = {x: dtype_map[x] for x in column_names if x in dtype_map}
    return df.astype(dtype=dtypes)

Option 2: 292 µs ± 14.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

def option_2(new_req, dtype_map):
    column_names = list(items['name'] for items in new_req[0])
    values = list(tuple(i['string_value'] for i in item) for item in requests)
    dtypes = [(x, dtype_map[x]) for x in column_names if x in dtype_map]
    return pd.DataFrame(np.array(values, dtype=dtypes))

For 4000 rows with 200 features

v1: takes 262 ms ± 8.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

v2: takes 297 ms ± 14.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

For now option 2 seems to be good, but it didn't scale for bigger request. Is there any other efficient way to transform my request to the desired df format ?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source