'Cumulative Sales of a Last Period

I have the following code that starts like this:

# Import Libraies
import numpy as np 
import pandas as pd
import datetime as dt

#Conexion to Drive
from google.colab import drive
drive.mount('/content/drive')

ruta = '/content/drive/MyDrive/example.csv'
df = pd.read_csv(ruta)
df.head(10)

The file that I import, you can download it from here: Data

And it looks like this: enter image description here

Then what I do is group the values ​​and then create a metric called Rolling Year (RY_ACTUAL) and (RY_LAST), these help me to know the sales of each category, for example the Blue category, twelve months ago. This metric works fine:

# ROLLING YEAR
# I want to make a Roling Year for each category. Thats mean how much sell each category since 12 moths ago TO current month

# RY_ACTUAL One year have 12 months so I pass as parameter in the rolling 12
f = lambda x:x.rolling(12).sum()
df_group["RY_ACTUAL"]  = df_group.groupby(["CATEGORY"])['Sales'].apply(f)

# RY_24 I create a rolling with 24 as parameter to compare actual RY vs last RY
f_1 = lambda x:x.rolling(24).sum()
df_group["RY_24"]  = df_group.groupby(["CATEGORY"])['Sales'].apply(f_1)

#RY_LAST Substract RY_24 - RY_Actual to get the correct amount. Thats mean amount of RY vs the amount of RY-1
df_group["RY_LAST"]  = df_group["RY_24"] - df_group["RY_ACTUAL"]

My problem is in the metric called Year To Date, which is nothing more than the accumulated sales of each category from JANUARY to the month where you read the table, for example if I stop in March 2015, know how much each category sold in January to March. The column I created called YTD_ACTUAL does just that for me and I achieve it like this:

# YTD_ACTUAL
df_group['YTD_ACTUAL'] = df_group.groupby(["CATEGORY","DATE"]).Sales.cumsum()

However, what I have not been able to do is the YTD_LAST column, that is, from the past period, which reminding the previous example where it was stopped in March 2015, suppose in the blue category, it should return to me how much was the accumulated sales for the blue category from JANUARY to MARCH but 2014 year.

My try >.<

#YTD_LAST
df_group['YTD_LAST'] = df_group.groupby(["CATEGORY", "DATE"]).Sales.apply(f)

Could someone help me to make this column correctly?

Thank you in advance, community!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source