'Recalculate values between row and convert quarterly data to monthly basis

I have two part of questions that I would like to as for help, it is on subtracting between rows based on each items and year quarter, and then to spread my data into an average of monthly basis. I just have no idea how to write code to operate in such a way.

Here is a dataframe of data. The actual data will consist of hundreds of items and nearly 50 columns. Time range is 20 years of historical data. Number is each column has been calculated as cumulative value. For example Q1 is data from Jan to Mar, Q2 is data from Jan to Jun. All items are recorded in quarter basis.

data = [['kitkat', '2020Q4', 500000, 350000], ['kitkat', '2020Q3', 400000, 250000], ['kitkat', '2020Q2', 200000, 100000], ['kitkat', '2020Q1', 100000, 50000],
       ['kitkat', '2019Q4', 700000, 450000],['kitkat', '2019Q3', 500000, 300000],['kitkat', '2019Q1', 300000, 150000], 
       ['oreo', '2020Q4', 500000, 350000], ['oreo', '2020Q3', 400000, 250000], ['oreo', '2020Q2', 200000, 100000], ['oreo', '2020Q1', 100000, 50000],
       ['oreo', '2019Q4', 700000, 450000],['oreo', '2019Q3', 500000, 300000],['oreo', '2019Q2', 300000, 150000],['oreo', '2019Q1', 200000, 100000]]
df = pd.DataFrame(data, columns = ['Item', 'CloseBook', "Income", "Expense"])

Part 1
So for each item and for each year. The number in Q4 should contain only data from Oct to Dec. Therefore, the calculate show follow this rule for every year:
Q4 = Q4-Q3
Q3 = Q3-Q2
Q2 = Q2-Q1
Q1 = Q1.
But there is possibility that some quarter could be missing. If that is the case, it must subtract previous previous quarter eg. if Q2 is missing, then Q3-Q1. The result from this first part should be enter image description here

Part 2
Continue with the calculation from part 1. I need to add new column that will record time in monthly basis. And the result from previous calculation need to be divided by 3 or 6 or 9, based on how it has been calculated. The requirement will be that if
it calculate as Q4-Q3/Q3-Q2/Q2-Q1 then it must be divide by 3,
if calculate as Q4-Q2/Q3-Q1 then it must be divide by 6,
if calculate as Q4-Q1 then it must be divide by 9.

The final dataframe, for kitkat item, should show like this enter image description here

Or maybe, is there a shorter way, a more proper way to write code to calculate from raw data to the final result that I need?



Solution 1:[1]

The solution use groupby().apply() so it's not going to be fast. If you have millions of row, we need to find a more performant solution:

def f(group):
    idx = pd.IndexSlice
    cols = ["Income", "Expense"]

    # Part 1
    # Every quarter except Q1 is the delta between current row and previous row
    group.loc[idx[:, :, 2:], cols] = group[cols].diff()

    # Part 2
    year = group.index.get_level_values("Year")
    quarter = group.index.get_level_values("Quarter").to_series()

    times = []
    divisors = []
    for y, q, prev_q in zip(year, quarter, quarter.shift()):
        if q == 1:
            # Q1 always have 3 months
            m = pd.date_range(f"{y}-01-01", f"{y}-03-01", freq="MS")
        else:
            # Other quarters have varying number of months due to gap with
            # previous quarter
            m = pd.date_range(f"{y}-{int(prev_q)*3+1}-01", f"{y}-{q*3}-01", freq="MS")

        times.append(m)
        
        # If the gap is n month from previous quarter, the divisors are n
        # repeated n times
        divisors.append([len(m)] * len(m))

    group["Time"] = times
    group["Divisor"] = divisors
    group = group.explode(["Time", "Divisor"])

    group[cols] /= group["Divisor"].to_numpy()[:, None]
    return group

s = pd.PeriodIndex(df["CloseBook"], freq="Q")
result = (
    df.set_index(["Item", s.year.rename("Year"), s.quarter.rename("Quarter")])
    .sort_index()
    .groupby(level=[0, 1])
    .apply(f)
    .droplevel([1,2,3,4])
    .drop(columns="Divisor")
    .reset_index()
)

Result (trim and reorder columns and index to taste):

      Item CloseBook        Income       Expense       Time
0   kitkat    2019Q1      100000.0       50000.0 2019-01-01
1   kitkat    2019Q1      100000.0       50000.0 2019-02-01
2   kitkat    2019Q1      100000.0       50000.0 2019-03-01
3   kitkat    2019Q3  33333.333333       25000.0 2019-04-01
4   kitkat    2019Q3  33333.333333       25000.0 2019-05-01
5   kitkat    2019Q3  33333.333333       25000.0 2019-06-01
6   kitkat    2019Q3  33333.333333       25000.0 2019-07-01
7   kitkat    2019Q3  33333.333333       25000.0 2019-08-01
8   kitkat    2019Q3  33333.333333       25000.0 2019-09-01
9   kitkat    2019Q4  66666.666667       50000.0 2019-10-01
10  kitkat    2019Q4  66666.666667       50000.0 2019-11-01
11  kitkat    2019Q4  66666.666667       50000.0 2019-12-01
12  kitkat    2020Q1  33333.333333  16666.666667 2020-01-01
13  kitkat    2020Q1  33333.333333  16666.666667 2020-02-01
14  kitkat    2020Q1  33333.333333  16666.666667 2020-03-01
15  kitkat    2020Q2  33333.333333  16666.666667 2020-04-01
16  kitkat    2020Q2  33333.333333  16666.666667 2020-05-01
17  kitkat    2020Q2  33333.333333  16666.666667 2020-06-01
18  kitkat    2020Q3  66666.666667       50000.0 2020-07-01
19  kitkat    2020Q3  66666.666667       50000.0 2020-08-01
20  kitkat    2020Q3  66666.666667       50000.0 2020-09-01
21  kitkat    2020Q4  33333.333333  33333.333333 2020-10-01
22  kitkat    2020Q4  33333.333333  33333.333333 2020-11-01
23  kitkat    2020Q4  33333.333333  33333.333333 2020-12-01
24    oreo    2019Q1  66666.666667  33333.333333 2019-01-01
25    oreo    2019Q1  66666.666667  33333.333333 2019-02-01
26    oreo    2019Q1  66666.666667  33333.333333 2019-03-01
27    oreo    2019Q2  33333.333333  16666.666667 2019-04-01
28    oreo    2019Q2  33333.333333  16666.666667 2019-05-01
29    oreo    2019Q2  33333.333333  16666.666667 2019-06-01
30    oreo    2019Q3  66666.666667       50000.0 2019-07-01
31    oreo    2019Q3  66666.666667       50000.0 2019-08-01
32    oreo    2019Q3  66666.666667       50000.0 2019-09-01
33    oreo    2019Q4  66666.666667       50000.0 2019-10-01
34    oreo    2019Q4  66666.666667       50000.0 2019-11-01
35    oreo    2019Q4  66666.666667       50000.0 2019-12-01
36    oreo    2020Q1  33333.333333  16666.666667 2020-01-01
37    oreo    2020Q1  33333.333333  16666.666667 2020-02-01
38    oreo    2020Q1  33333.333333  16666.666667 2020-03-01
39    oreo    2020Q2  33333.333333  16666.666667 2020-04-01
40    oreo    2020Q2  33333.333333  16666.666667 2020-05-01
41    oreo    2020Q2  33333.333333  16666.666667 2020-06-01
42    oreo    2020Q3  66666.666667       50000.0 2020-07-01
43    oreo    2020Q3  66666.666667       50000.0 2020-08-01
44    oreo    2020Q3  66666.666667       50000.0 2020-09-01
45    oreo    2020Q4  33333.333333  33333.333333 2020-10-01
46    oreo    2020Q4  33333.333333  33333.333333 2020-11-01
47    oreo    2020Q4  33333.333333  33333.333333 2020-12-01

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Code Different