'Copy contents from one Dataframe to another based on column values in Pandas

Two seperate similar DataFrames with different lengths

df2=

Index First Name Age Gender Weight
0 James 25 Male 155
1 John 27 Male 175
2 Patricia 23 Female 135
3 Mary 22 Female 125
4 Martin 30 Male 185
5 Margaret 29 Female 141
6 Kevin 22 Male 198

df1=

Index First Name Age Gender Weight Height
0 James 25 Male 165 5'10
1 John 27 Male 175 5'9
2 Matthew 29 Male 183 6'0
3 Patricia 23 Female 135 5'3
4 Mary 22 Female 125 5'4
5 Rachel 29 Female 123 5'3
6 Jose 20 Male 175 5'11
7 Kevin 22 Male 192 6'2

df2 has some rows which are not in df1 and df1 has some values which are not in df2.

I am comparing df1 against df2. I have calculated the newentries with the following code

newentries = df2.loc[~df2['First Name'].isin(df1['First Name'])]
deletedentries = df1.loc[~df1['First Name'].isin(df2['First Name'])]

where newentries denote the rows/entries that are there in df2 but not in df1; deletedentries denote the rows/entries that are there in df1 but not in df2. The above code works perfectly fine.

I need to copy the height from df1 to df2 when the first names are equal.

df2.loc[df2['First Name'].isin(df1['First Name']),"Height"] = df1.loc[df1['First Name'].isin(df2['First Name']),"Height"]

The above code copies the values however indexing is causing an issue and the values are not copied to the corresponding rows, I tried to promote First Name as the Index but that doesn't solve the issue. Please help me with a solution

Also, I need to calculate the modified values, if the First Name is same, I need to check for modified values; for example in df1, the weight of James is 155 however in df2 the weight is 165, so I need to store the modified weight of James(165) and index(0) in a new dataframe without iteration; the iteration takes a long time because this is a sample of a big dataframe with a lot more rows and columns.

Desired output:

df2=

Index First Name Age Gender Weight Height
0 James 25 Male 155 5'10
1 John 27 Male 175 5'9
2 Patricia 23 Female 135 5'3
3 Mary 22 Female 125 5'4
4 Martin 30 Male 185
5 Margaret 29 Female 141
6 Kevin 22 Male 198 6'2

Martin's and Margaret's heights are not there in df1, so their heights are not updated in df2

newentries=

Index First Name Age Gender Weight Height
4 Martin 30 Male 185
5 Margaret 29 Female 141

deletedentries=

Index First Name Age Gender Weight Height
2 Matthew 29 Male 183 6'0
5 Rachel 29 Male 123 5'3
6 Jose 20 Male 175 5'11

modval=

Index First Name Age Gender Weight Height
0 James 165
7 Kevin 192


Solution 1:[1]

for your desired output for df2 you can try this:

desired_df2 = df2.merge(df1[['First Name','Height']], on='First Name', how='left')
#if you want to change the "NaN" values just add ".fillna(fill_value=0)" for e.g 0 after the merge
print(desired_df2)


  First Name  Age   Gender  Weight Height
0     James    25    Male      155   5'10
1      John    27    Male      175    5'9
2  Patricia    23  Female      135    5'3
3      Mary    22  Female      125    5'4
4    Martin    30    Male      185    NaN
5  Margaret    29  Female      141    NaN
6     Kevin    22    Male      198    6'2

new and deleted entries is already right. for the moment I'm a bit stuck how to get the modval dataframe. I'll update my answer if I get a solution.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1