'Copy contents from one Dataframe to another based on column values in Pandas
Two seperate similar DataFrames with different lengths
df2=
| Index | First Name | Age | Gender | Weight |
|---|---|---|---|---|
| 0 | James | 25 | Male | 155 |
| 1 | John | 27 | Male | 175 |
| 2 | Patricia | 23 | Female | 135 |
| 3 | Mary | 22 | Female | 125 |
| 4 | Martin | 30 | Male | 185 |
| 5 | Margaret | 29 | Female | 141 |
| 6 | Kevin | 22 | Male | 198 |
df1=
| Index | First Name | Age | Gender | Weight | Height |
|---|---|---|---|---|---|
| 0 | James | 25 | Male | 165 | 5'10 |
| 1 | John | 27 | Male | 175 | 5'9 |
| 2 | Matthew | 29 | Male | 183 | 6'0 |
| 3 | Patricia | 23 | Female | 135 | 5'3 |
| 4 | Mary | 22 | Female | 125 | 5'4 |
| 5 | Rachel | 29 | Female | 123 | 5'3 |
| 6 | Jose | 20 | Male | 175 | 5'11 |
| 7 | Kevin | 22 | Male | 192 | 6'2 |
df2 has some rows which are not in df1 and df1 has some values which are not in df2.
I am comparing df1 against df2. I have calculated the newentries with the following code
newentries = df2.loc[~df2['First Name'].isin(df1['First Name'])]
deletedentries = df1.loc[~df1['First Name'].isin(df2['First Name'])]
where newentries denote the rows/entries that are there in df2 but not in df1; deletedentries denote the rows/entries that are there in df1 but not in df2. The above code works perfectly fine.
I need to copy the height from df1 to df2 when the first names are equal.
df2.loc[df2['First Name'].isin(df1['First Name']),"Height"] = df1.loc[df1['First Name'].isin(df2['First Name']),"Height"]
The above code copies the values however indexing is causing an issue and the values are not copied to the corresponding rows, I tried to promote First Name as the Index but that doesn't solve the issue. Please help me with a solution
Also, I need to calculate the modified values, if the First Name is same, I need to check for modified values; for example in df1, the weight of James is 155 however in df2 the weight is 165, so I need to store the modified weight of James(165) and index(0) in a new dataframe without iteration; the iteration takes a long time because this is a sample of a big dataframe with a lot more rows and columns.
Desired output:
df2=
| Index | First Name | Age | Gender | Weight | Height |
|---|---|---|---|---|---|
| 0 | James | 25 | Male | 155 | 5'10 |
| 1 | John | 27 | Male | 175 | 5'9 |
| 2 | Patricia | 23 | Female | 135 | 5'3 |
| 3 | Mary | 22 | Female | 125 | 5'4 |
| 4 | Martin | 30 | Male | 185 | |
| 5 | Margaret | 29 | Female | 141 | |
| 6 | Kevin | 22 | Male | 198 | 6'2 |
Martin's and Margaret's heights are not there in df1, so their heights are not updated in df2
newentries=
| Index | First Name | Age | Gender | Weight | Height |
|---|---|---|---|---|---|
| 4 | Martin | 30 | Male | 185 | |
| 5 | Margaret | 29 | Female | 141 |
deletedentries=
| Index | First Name | Age | Gender | Weight | Height |
|---|---|---|---|---|---|
| 2 | Matthew | 29 | Male | 183 | 6'0 |
| 5 | Rachel | 29 | Male | 123 | 5'3 |
| 6 | Jose | 20 | Male | 175 | 5'11 |
modval=
| Index | First Name | Age | Gender | Weight | Height |
|---|---|---|---|---|---|
| 0 | James | 165 | |||
| 7 | Kevin | 192 |
Solution 1:[1]
for your desired output for df2 you can try this:
desired_df2 = df2.merge(df1[['First Name','Height']], on='First Name', how='left')
#if you want to change the "NaN" values just add ".fillna(fill_value=0)" for e.g 0 after the merge
print(desired_df2)
First Name Age Gender Weight Height
0 James 25 Male 155 5'10
1 John 27 Male 175 5'9
2 Patricia 23 Female 135 5'3
3 Mary 22 Female 125 5'4
4 Martin 30 Male 185 NaN
5 Margaret 29 Female 141 NaN
6 Kevin 22 Male 198 6'2
new and deleted entries is already right. for the moment I'm a bit stuck how to get the modval dataframe. I'll update my answer if I get a solution.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
