'Pandas fill in missing monthly dates in DataFrame, fill up one specific column with zeros

I am facing an issue with Pandas and how to fill up missing dates in a DataFrame. The structure of the given DataFrame is as follows:

     Amount  Code     Type   Date
0     34.97  J36J     74343 2016-01-01
1     16.32  J36J     74343 2016-04-01
2     10.30  J36J     69927 2015-12-01
3     10.45  J36J     69927 2016-07-01
4      5.63  J36J     69927 2017-03-01
5     15.79  J36J     69927 2018-09-01
6     15.00  J36J     69927 2019-06-01
7      6.44  J36J     69926 2016-03-01
8      6.47  J36J     69926 2017-03-01
9     15.00  J36J     69926 2018-07-01
10    15.00  J36J     69926 2019-06-01

Amount: well, the amount
Code: Productcode which is the same throughout the entire DataFrame
Type: A Producttype, there are many different ones
Date: A Daterange which spans the time between December 2015 to September 2020.

My goal is to have a monthly entry for every Type covering this timespan. Meaning, every Material should have 58 entries. The 'artificially' created monthly entries should have an amount of 0. So, my expected output would be (just for one Type, as an example)

     Amount  Code     Type   Date
0     34.97  J36J     74343 2016-01-01
1     16.32  J36J     74343 2016-02-01
2     0      J36J     74343 2016-03-01
3     0      J36J     74343 2016-04-01
4     0      J36J     74343 2016-05-01
5     0      J36J     74343 2016-06-01
6     0      J36J     74343 2016-07-01
7     0      J36J     74343 2016-08-01
8     0      J36J     74343 2016-09-01
9     0      J36J     74343 2016-10-01
10    0      J36J     74343 2016-11-01
11    0      J36J     74343 2016-12-01

Fortunately, somebody already had the same question ( Pandas fill in missing dates in DataFrame with multiple columns )

I adapted the quite helpful answer to my case:

df.Date=pd.to_datetime(df.Date)
s=pd.date_range(df.Date.min(),df.Date.max(),freq='MS')

df=df.set_index(['Code','Type','Date']).\
      Amount.unstack().reindex(columns=s,fill_value=0).stack().reset_index()
df

This worked quite well, but I checked the resulting DataFrame afterwards and it seems like some of the dates are missing.

398     74343  J36J 2016-01-01  34.97
399     74343  J36J 2016-02-01   0.00
400     74343  J36J 2016-04-01  16.32
401     74343  J36J 2016-05-01   0.00
402     74343  J36J 2016-06-01   0.00
403     74343  J36J 2016-08-01   0.00
404     74343  J36J 2016-10-01   0.00
405     74343  J36J 2016-11-01   0.00
406     74343  J36J 2016-12-01   0.00

Do any of you know what could be the reason for this? I'm assuming maybe it's because of the Frequency ('MS') I've chosen? but I cannot think any of the others could be fitting. (https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html) OR do I have to set the datarange manually? In my initial DataFrame obviously not all Dates are available.

Any help on that matter is appreciated.

Solution 1:^[1]

One option is to use the complete function from pyjanitor to expose the implicitly missing rows; afterwards you can fill with fillna:

# pip install pyjanitor
import pandas as pd
import janitor

# create a dictionary of dates, spanning from December 2015 to September 2020
new_dates = {"Date": pd.date_range(start = "2015-12-01", end = "2020-09-01", freq="MS")}

# apply with `complete` for each group of `Type` and `Code`, 
# and fill the null values in `Amount` with 0:

df.complete(new_dates, by=["Type", "Code"], sort=True).fillna({'Amount': 0})

     Amount  Code   Type       Date
0       0.0  J36J   9927 2015-12-01
1       0.0  J36J   9927 2016-01-01
2       0.0  J36J   9927 2016-02-01
3       0.0  J36J   9927 2016-03-01
4       0.0  J36J   9927 2016-04-01
..      ...   ...    ...        ...
227     0.0  J36J  74343 2020-05-01
228     0.0  J36J  74343 2020-06-01
229     0.0  J36J  74343 2020-07-01
230     0.0  J36J  74343 2020-08-01
231     0.0  J36J  74343 2020-09-01

[232 rows x 4 columns]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	sammywemmy

'Pandas fill in missing monthly dates in DataFrame, fill up one specific column with zeros

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]