'Writing data frame for future 6 months in pandas for big amount of data

I would like to optimize my program which should check the date and write extra rows which will after all show 6 months with the input date. E.g. if I have date 2021-01-01 for some ID I will get rows with the same ID and date for 6 months in a row with the started one which is JAN 2021 (this is shown in example below).

I have a data frame with two columns: ID and Demand_date as it is in input:

ID         Demand_date
001        2021-01-01
002        2016-03-01
003        2015-10-01

And the output should look like that:

ID         Demand_date
001        2021-01-01
001        2021-02-01
001        2021-03-01
001        2021-04-01
001        2021-05-01
001        2021-06-01
002        2016-03-01
002        2016-04-01
002        2016-05-01
002        2016-06-01
002        2016-07-01
002        2016-08-01
003        2015-10-01
003        2015-11-01
003        2015-12-01
003        2016-01-01
003        2016-02-01
003        2016-03-01

I have too many loops in my program and for 1M of data it is taking too long time.



Solution 1:[1]

Use nested list comprehension for create list of dictionaries and pass to DataFrame constructor with add new months by offsets.DateOffset:

N = 6

df = pd.DataFrame([{'ID':y, 'Demand_date': x+pd.DateOffset(months=i)} 
                   for y, x in zip(df['ID'], pd.to_datetime(df['Demand_date'])) 
                   for i in range(N)])
print (df)
    ID Demand_date
0    1  2021-01-01
1    1  2021-02-01
2    1  2021-03-01
3    1  2021-04-01
4    1  2021-05-01
5    1  2021-06-01
6    2  2016-03-01
7    2  2016-04-01
8    2  2016-05-01
9    2  2016-06-01
10   2  2016-07-01
11   2  2016-08-01
12   3  2015-10-01
13   3  2015-11-01
14   3  2015-12-01
15   3  2016-01-01
16   3  2016-02-01
17   3  2016-03-01

Solution 2:[2]

You could use pandas.DateOffset:

N = 6
df['Demand_date'] = [[(x+pd.DateOffset(months=i)) for i in range(N)]
                     for x in pd.to_datetime(df['Demand_date'])]
df = df.explode('Demand_date')

output:

   ID Demand_date
0   1  2021-01-01
0   1  2021-02-01
0   1  2021-03-01
0   1  2021-04-01
0   1  2021-05-01
0   1  2021-06-01
1   2  2016-03-01
1   2  2016-04-01
1   2  2016-05-01
1   2  2016-06-01
1   2  2016-07-01
1   2  2016-08-01
2   3  2015-10-01
2   3  2015-11-01
2   3  2015-12-01
2   3  2016-01-01
2   3  2016-02-01
2   3  2016-03-01

Solution 3:[3]

Given the input

DF = pd.DataFrame([], columns=["ID","DATE"])
DF["ID"] = ["1","2"]
DF["DATE"] = pd.to_datetime(["2020-01-03","2021-11-22"])

enter image description here

You can map a date range, and explode it into a new dataframe

temp = DF["DATE"].map(lambda x: pd.date_range(x, periods=3, freq='D')).reset_index()
temp = temp.rename(columns={"index":"ID"})
temp = temp.explode("DATE")
temp

That results in a daily resample;

enter image description here

I imagine you can figure out the monthly resample.
Hope that helps!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jezrael
Solution 2 mozway
Solution 3