'Pandas - imputing zero rows for missing months [duplicate]
I have a dataframe that looks like this:
| Product Name | Product Category | Month | Sales ($) | # Customers |
|---|---|---|---|---|
| Potato | Produce | Jan 2019 | 2 | 1 |
| Potato | Produce | Mar 2019 | 2 | 2 |
| Potato | Produce | Apr 2019 | 1 | 1 |
| Cheese | Dairy | Feb 2019 | 1 | 4 |
| Cheese | Dairy | Mar 2019 | 3 | 5 |
| Cheese | Dairy | Apr 2019 | 3 | 1 |
I want each product to have a row for each month between Jan and May, with zero for the numeric columns:
| Product Name | Product Category | Month | Sales ($) | # Customers |
|---|---|---|---|---|
| Potato | Produce | Jan 2019 | 2 | 1 |
| Potato | Produce | Feb 2019 | 0 | 0 |
| Potato | Produce | Mar 2019 | 2 | 2 |
| Potato | Produce | Apr 2019 | 1 | 1 |
| Potato | Produce | May 2019 | 0 | 0 |
| Cheese | Dairy | Jan 2019 | 0 | 0 |
| Cheese | Dairy | Feb 2019 | 1 | 4 |
| Cheese | Dairy | Mar 2019 | 3 | 5 |
| Cheese | Dairy | Apr 2019 | 3 | 1 |
| Cheese | Dairy | May 2019 | 0 | 0 |
My current plan to do this is to select a unique row for each product, duplicate it for as many months as needed, iterate over the rows setting the month and numeric columns, and finally append those rows. However iterating in pandas is rarely the right approach so I am looking for a better way. Any ideas?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
