'Correct way of adding new columns/headers to a dataframe
I need to add new columns to a dataframe. Every column has a header and a value across all the rows (the value is the same for all the columns).
Right now im doing something like this:
array_of_new_headers = [...]
for column in array_of_new_headers:
df[column] = 0
As a result I'm getting this message:
PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling
frame.insertmany times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, usenewframe = frame.copy()
It tells me to use concat, but, I don't need to concatenate two dataframes really, should I use concat for better performance and better code? To me it doesn't really make sense unless I think of the arrays as also dataframes maybe.
Solution 1:[1]
You can pass an unpacked dictionary with keys as column names, and values as value for the columns to pandas.DataFrame.assign :
>>> array_of_new_headers = [...]
>>> df.assign(**{c:0 for c in array_of_new_headers})
But the operation is immutable, so make sure to assign it back to the required variable.
Solution 2:[2]
should I use concat for better performance
Beware so-called premature optimization, if your code does work rapidly enough for your needs then you might end simply wasting your time on trying to make it faster.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ThePyGuy |
| Solution 2 | Daweo |
