'Merging DataFrame based on conditions in Pandas
I have two different DataFrames, one with some values of revenue in zero because I don't have the data and a second one with the data I am missing in the other dataframe.
df1=pd.DataFrame([[2020-01,2020-01,2020-02,2020-02,2020-03,2020-03],['PC','Mobile','PC','Mobile','PC','Mobile'],[1,1.2,1.4,1.8],[1.2,1.4,1.6,2],[0.6,1.4,1.6,0],[0.5,1,1.5,0],[0.8,1.3,0,0],[0.6,1.2,0,0]],
columns=['Date', 'Platform', "Day 1","Day 7","Day 14","Day 30"])
df2=pd.DataFrame([[2020-02,2020-02],['PC','Mobile'],[0.6,1.4,1.6,2],[0.5,1.3,1.6,2.2]], columns=['Date', "Day 1","Day 7","Day 14","Day 30"])
What I want to do is to automatically remove the cells that are empty if they are matching the same date, remove it from the first dataframe and concat the whole second dataframe so it is complete.
Is there any way to do this? I have tried to remove the lines with zeros, but as I have more rows with zeros, it is removing cells I want to maintain.
This would be the final output:
Thanks!
Solution 1:[1]
For replace 0 by values from df2 by date columns use :
df = (df1.set_index('date')
.replace(0, np.nan)
.combine_first(df2.set_index('date'))
.fillna(0)
.reset_index())
print (df)
date Day 1 Day 7 Day 14 Day 30
0 2020-01 1.0 1.2 1.4 1.8
1 2020-02 0.6 1.4 1.6 2.0
2 2020-03 0.8 1.3 1.7 2.2
3 2020-04 0.7 0.0 0.0 0.0
If possible aggregate mean without 0 values by df1 use:
df = (pd.concat([df1.replace(0, np.nan), df2])
.groupby('date', as_index=False).mean()
.fillna(0))
print (df)
date Day 1 Day 7 Day 14 Day 30
0 2020-01 1.0 1.2 1.4 1.8
1 2020-02 0.6 1.4 1.6 2.0
2 2020-03 0.8 1.3 1.7 2.2
3 2020-04 0.7 0.0 0.0 0.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |



