'Pandas fill in missing dates in DataFrame with multiple columns
I want to add missing dates for a specific date range, but keep all columns. I found many posts using afreq(), resample(), reindex(), but they seemed to be for Series and I couldn't get them to work for my DataFrame.
Given a sample dataframe:
data = [{'id' : '123', 'product' : 'apple', 'color' : 'red', 'qty' : 10, 'week' : '2019-3-7'}, {'id' : '123', 'product' : 'apple', 'color' : 'blue', 'qty' : 20, 'week' : '2019-3-21'}, {'id' : '123', 'product' : 'orange', 'color' : 'orange', 'qty' : 8, 'week' : '2019-3-21'}]
df = pd.DataFrame(data)
color id product qty week
0 red 123 apple 10 2019-3-7
1 blue 123 apple 20 2019-3-21
2 orange 123 orange 8 2019-3-21
My goal is to return below; filling in qty as 0, but fill other columns. Of course, I have many other ids. I would like to be able to specify the start/end dates to fill; this example uses 3/7 to 3/21.
color id product qty week
0 red 123 apple 10 2019-3-7
1 blue 123 apple 20 2019-3-21
2 orange 123 orange 8 2019-3-21
3 red 123 apple 0 2019-3-14
4 red 123 apple 0 2019-3-21
5 blue 123 apple 0 2019-3-7
6 blue 123 apple 0 2019-3-14
7 orange 123 orange 0 2019-3-7
8 orange 123 orange 0 2019-3-14
How can I keep the remainder of my DataFrame intact?
Solution 1:[1]
One option is to use the complete function from pyjanitor to expose the implicitly missing rows; afterwards you can fill with fillna:
# pip install pyjanitor
import pandas as pd
import janitor
df.week = pd.to_datetime(df.week)
# create new dates, which will be used to expand the dataframe
new_dates = {"week": pd.date_range(df.week.min(), df.week.max(), freq="7D")}
# use the complete function
# note how color, id and product are wrapped together
# this ensures only missing values based on data in the dataframe is exposed
# if you want all combinations, then you get rid of the tuple,
(df
.complete(("color", "id", "product"), new_dates, sort = False)
.fillna({'qty':0, downcast='infer')
)
id product color qty week
0 123 apple red 10 2019-03-07
1 123 apple blue 20 2019-03-21
2 123 orange orange 8 2019-03-21
3 123 apple red 0 2019-03-14
4 123 apple red 0 2019-03-21
5 123 apple blue 0 2019-03-07
6 123 apple blue 0 2019-03-14
7 123 orange orange 0 2019-03-07
8 123 orange orange 0 2019-03-14
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
