'What's the best way to insert columns in a pandas Dataframe when you don't know the exact number of columns?

I have an input dataframe.

I have also a list, with the same len as the number of rows in the dataframe.

Every element of the list is a dictionary: the key is the name of the new column, and the value is the value to be inserted in the cell.

I have to insert the columns from that list in the dataframe.

What is the best way to do so?

So far, given the input dataframe indf and the list l, I came up with something on the line of:

from copy import deepcopy
outdf = deepcopy(indf)
for index, row in indf.iterrows():
    e = l[index]
    for key, value in e:
        outdf.loc[index, key] = value

But it doesn't seem pythonic and pandasnic and I get performance warnings like:

<ipython-input-5-9dde586a9c14>:8: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead.  To get a de-fragmented frame, use `newframe = frame.copy()`


Solution 1:[1]

If the sorting of the list and the data frame is the same, you can convert your list of dictionaries to a data frame:

mylist = [
    {'a':1,'b':2,'c':3},
    {'e':11,'f':22,'c':33},
    {'a':111,'b':222,'c':333}
]

mylist_df = pd.DataFrame(mylist)
a b c e f
0 1 2 3 nan nan
1 nan nan 33 11 22
2 111 222 333 nan nan

Then you can use pd.concat to merge the list to your input data frame:

result = pd.concat([input_df, mylist_df], axis=1)

In this way, there is always a column created for all unique keys in your dictionary, regardless of they exist in one dictionary and not the other.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1