'Recalculate values between row and convert quarterly data to monthly basis
I have two part of questions that I would like to as for help, it is on subtracting between rows based on each items and year quarter, and then to spread my data into an average of monthly basis. I just have no idea how to write code to operate in such a way.
Here is a dataframe of data. The actual data will consist of hundreds of items and nearly 50 columns. Time range is 20 years of historical data. Number is each column has been calculated as cumulative value. For example Q1 is data from Jan to Mar, Q2 is data from Jan to Jun. All items are recorded in quarter basis.
data = [['kitkat', '2020Q4', 500000, 350000], ['kitkat', '2020Q3', 400000, 250000], ['kitkat', '2020Q2', 200000, 100000], ['kitkat', '2020Q1', 100000, 50000],
['kitkat', '2019Q4', 700000, 450000],['kitkat', '2019Q3', 500000, 300000],['kitkat', '2019Q1', 300000, 150000],
['oreo', '2020Q4', 500000, 350000], ['oreo', '2020Q3', 400000, 250000], ['oreo', '2020Q2', 200000, 100000], ['oreo', '2020Q1', 100000, 50000],
['oreo', '2019Q4', 700000, 450000],['oreo', '2019Q3', 500000, 300000],['oreo', '2019Q2', 300000, 150000],['oreo', '2019Q1', 200000, 100000]]
df = pd.DataFrame(data, columns = ['Item', 'CloseBook', "Income", "Expense"])
Part 1
So for each item and for each year. The number in Q4 should contain only data from Oct to Dec. Therefore, the calculate show follow this rule for every year:
Q4 = Q4-Q3
Q3 = Q3-Q2
Q2 = Q2-Q1
Q1 = Q1.
But there is possibility that some quarter could be missing. If that is the case, it must subtract previous previous quarter eg. if Q2 is missing, then Q3-Q1.
The result from this first part should be

Part 2
Continue with the calculation from part 1. I need to add new column that will record time in monthly basis. And the result from previous calculation need to be divided by 3 or 6 or 9, based on how it has been calculated.
The requirement will be that if
it calculate as Q4-Q3/Q3-Q2/Q2-Q1 then it must be divide by 3,
if calculate as Q4-Q2/Q3-Q1 then it must be divide by 6,
if calculate as Q4-Q1 then it must be divide by 9.
The final dataframe, for kitkat item, should show like this

Or maybe, is there a shorter way, a more proper way to write code to calculate from raw data to the final result that I need?
Solution 1:[1]
The solution use groupby().apply() so it's not going to be fast. If you have millions of row, we need to find a more performant solution:
def f(group):
idx = pd.IndexSlice
cols = ["Income", "Expense"]
# Part 1
# Every quarter except Q1 is the delta between current row and previous row
group.loc[idx[:, :, 2:], cols] = group[cols].diff()
# Part 2
year = group.index.get_level_values("Year")
quarter = group.index.get_level_values("Quarter").to_series()
times = []
divisors = []
for y, q, prev_q in zip(year, quarter, quarter.shift()):
if q == 1:
# Q1 always have 3 months
m = pd.date_range(f"{y}-01-01", f"{y}-03-01", freq="MS")
else:
# Other quarters have varying number of months due to gap with
# previous quarter
m = pd.date_range(f"{y}-{int(prev_q)*3+1}-01", f"{y}-{q*3}-01", freq="MS")
times.append(m)
# If the gap is n month from previous quarter, the divisors are n
# repeated n times
divisors.append([len(m)] * len(m))
group["Time"] = times
group["Divisor"] = divisors
group = group.explode(["Time", "Divisor"])
group[cols] /= group["Divisor"].to_numpy()[:, None]
return group
s = pd.PeriodIndex(df["CloseBook"], freq="Q")
result = (
df.set_index(["Item", s.year.rename("Year"), s.quarter.rename("Quarter")])
.sort_index()
.groupby(level=[0, 1])
.apply(f)
.droplevel([1,2,3,4])
.drop(columns="Divisor")
.reset_index()
)
Result (trim and reorder columns and index to taste):
Item CloseBook Income Expense Time
0 kitkat 2019Q1 100000.0 50000.0 2019-01-01
1 kitkat 2019Q1 100000.0 50000.0 2019-02-01
2 kitkat 2019Q1 100000.0 50000.0 2019-03-01
3 kitkat 2019Q3 33333.333333 25000.0 2019-04-01
4 kitkat 2019Q3 33333.333333 25000.0 2019-05-01
5 kitkat 2019Q3 33333.333333 25000.0 2019-06-01
6 kitkat 2019Q3 33333.333333 25000.0 2019-07-01
7 kitkat 2019Q3 33333.333333 25000.0 2019-08-01
8 kitkat 2019Q3 33333.333333 25000.0 2019-09-01
9 kitkat 2019Q4 66666.666667 50000.0 2019-10-01
10 kitkat 2019Q4 66666.666667 50000.0 2019-11-01
11 kitkat 2019Q4 66666.666667 50000.0 2019-12-01
12 kitkat 2020Q1 33333.333333 16666.666667 2020-01-01
13 kitkat 2020Q1 33333.333333 16666.666667 2020-02-01
14 kitkat 2020Q1 33333.333333 16666.666667 2020-03-01
15 kitkat 2020Q2 33333.333333 16666.666667 2020-04-01
16 kitkat 2020Q2 33333.333333 16666.666667 2020-05-01
17 kitkat 2020Q2 33333.333333 16666.666667 2020-06-01
18 kitkat 2020Q3 66666.666667 50000.0 2020-07-01
19 kitkat 2020Q3 66666.666667 50000.0 2020-08-01
20 kitkat 2020Q3 66666.666667 50000.0 2020-09-01
21 kitkat 2020Q4 33333.333333 33333.333333 2020-10-01
22 kitkat 2020Q4 33333.333333 33333.333333 2020-11-01
23 kitkat 2020Q4 33333.333333 33333.333333 2020-12-01
24 oreo 2019Q1 66666.666667 33333.333333 2019-01-01
25 oreo 2019Q1 66666.666667 33333.333333 2019-02-01
26 oreo 2019Q1 66666.666667 33333.333333 2019-03-01
27 oreo 2019Q2 33333.333333 16666.666667 2019-04-01
28 oreo 2019Q2 33333.333333 16666.666667 2019-05-01
29 oreo 2019Q2 33333.333333 16666.666667 2019-06-01
30 oreo 2019Q3 66666.666667 50000.0 2019-07-01
31 oreo 2019Q3 66666.666667 50000.0 2019-08-01
32 oreo 2019Q3 66666.666667 50000.0 2019-09-01
33 oreo 2019Q4 66666.666667 50000.0 2019-10-01
34 oreo 2019Q4 66666.666667 50000.0 2019-11-01
35 oreo 2019Q4 66666.666667 50000.0 2019-12-01
36 oreo 2020Q1 33333.333333 16666.666667 2020-01-01
37 oreo 2020Q1 33333.333333 16666.666667 2020-02-01
38 oreo 2020Q1 33333.333333 16666.666667 2020-03-01
39 oreo 2020Q2 33333.333333 16666.666667 2020-04-01
40 oreo 2020Q2 33333.333333 16666.666667 2020-05-01
41 oreo 2020Q2 33333.333333 16666.666667 2020-06-01
42 oreo 2020Q3 66666.666667 50000.0 2020-07-01
43 oreo 2020Q3 66666.666667 50000.0 2020-08-01
44 oreo 2020Q3 66666.666667 50000.0 2020-09-01
45 oreo 2020Q4 33333.333333 33333.333333 2020-10-01
46 oreo 2020Q4 33333.333333 33333.333333 2020-11-01
47 oreo 2020Q4 33333.333333 33333.333333 2020-12-01
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Code Different |
