'Insert new data to dataframe
I have a dataframe
employees = [('Jack', 34, 'Sydney' ) ,
('Riti', 31, 'Delhi' ) ,
('Aadi', 16, 'London') ,
('Mark', 18, 'Delhi' )]
dataFrame = pd.DataFrame( employees,
columns=['Name', 'Age', 'City'])
I would like to append this DataFrame with some new columns. I did it with:
data = ['Height', 'Weight', 'Eyecolor']
duduFrame = pd.DataFrame(columns=data)
This results in:
Name Age City Height Weight Eyecolor
0 Jack 34.0 Sydney NaN NaN NaN
1 Riti 31.0 Delhi NaN NaN NaN
2 Aadi 16.0 London NaN NaN NaN
3 Mark 18.0 Delhi NaN NaN NaN
So far so good.
Now I have new Data about Height, Weight and Eyecolor for "Riti":
Riti_data = [(172, 74, 'Brown')]
This I would like to add to dataFrame.
I tried it with
dataFrame.loc['Riti', [duduFrame]] = Riti_data
But I get the error
ValueError: Buffer has wrong number of dimensions (expected 1, got 3)
What am I doing wrong?
Solution 1:[1]
try this :
dataFrame.loc[dataFrame['Name']=='Riti', ['Height','Weight','Eyecolor']] = Riti_data
your mistake I think was not to specify the columns you did : duduFrame instead of the data which contains the name columns you want to add the new value
Solution 2:[2]
You can do this :
df = pd.concat([dataFrame, duduFrame])
df = df.set_index('Name')
df.loc['Riti',data] = [172,74,'Brown']
Resulting in :
Age City Height Weight Eyecolor
Name
Jack 34.0 Sydney NaN NaN NaN
Riti 31.0 Delhi 172 74 Brown
Aadi 16.0 London NaN NaN NaN
Mark 18.0 Delhi NaN NaN NaN
Solution 3:[3]
Pandas has a pd.concat function, whose role is to concatenate dataframes, either vertically (axis = 0), or in your case horizontally (axis = 1).
However, I personally see merging horizontally more like a pd.merge use-case, which gives you more flexibility on how exactly do you want the merge to happen.
In your case, you want to match Name column, right ?
So I would do it in 2 steps:
- Build both dataframes with column
Nameand their respective data - Merge both dataframes with
pd.merge(df1, df2, on = 'Name', how = 'outer')
The how = outer parameter makes sure that you don't lose any data from df1 or df2, in case some Name has data in only one of both dataframes. This will be easier for you to catch errors with your data, and will make you think more in terms of SQL JOIN, which is a necessary way of thinking :).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | DataSciRookie |
| Solution 2 | LCMa |
| Solution 3 |
