'proper way to replace NaN value from another dataframe on column match in pandas
I'm newbie to pandas, and trying to replace a column value (NaN) in df1 with df2 with column value match. And facing the following error.
df1
unique_col | Measure
944537 NaN
7811403 NaN
8901242114307 1
df2
unique_col | Measure
944537 18
7811403 12
8901242114307 17.5
df1.loc[(df1.unique_col.isin(df2.unique_col) &
df1.Measure.isnull()), ['Measure']] = df2[['Measure']]
I have a two dataframes with 3 million records and on performing below operation facing the following error:
ValueError: cannot reindex from a duplicate axis
Solution 1:[1]
You way to easily fill nans is to use fillna function. In your case, if you have the dfs as (notice the indexes)
unique_col Measure
0 944537 NaN
1 7811403 NaN
2 8901242114307 1.0
unique_col Measure
0 944537 18.0
1 7811403 12.0
2 8901242114307 17.5
You can simply
>>> df.fillna(df2)
unique_col Measure
0 944537 18.0
1 7811403 12.0
2 8901242114307 1.0
If indexes are not the same as the above, you can set them to be the same and use the same function
df = df.set_index('unique_col')
df.fillna(df2.set_index('unique_col'))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | rafaelc |
