'vectorizing a sequential operation that has a conditional in Pandas
I have a Pandas dataframe with 3 columns. There is a series of booleans, a series of values, and a column that I want to fill, C. I also have an initial value for C.
A B C
----------------------
True 10 100
False 20 NaN
True 25 NaN
True 28 NaN
...
I want the values of column C (for C[1:]) to follow the following rule.
if A[i - 1]:
C[i] = C[i - 1] * B[i] / B[i - 1]
else:
C[i] = C[i - 1]
Of course this formula cannot determine C[0], but C[0] is provided.
Is there a way to do this efficiently using vectorized operations?
What I've tried:
The following command doesn't account for the sequential nature of the operation.
df.loc[df.A , 'C'] = df.C.shift(1) * df.B / df.B.shift(1)
df.loc[df.A == 0, 'C'] = df.C.shift(1)
If I were to do use an apply function to compute this I would have to probably make new shifted columns like the following, and then only run the apply for rows [1:]? But how do I get the updated previous value of C?
df["s_A"] = df.A.shift(1)
df["s_B"] = df.B.shift(1)
df["s_C"] = df.C.shift(1)
df["s_A"][0] = False; # this assumption is okay within the purposes
Should this work and is there a faster way? There may be up to 400,000 rows in total across multiple dataframes but it is not super time sensitive.
For clarity's sake I will mention that there are around 12 columns total, but only these three are relevant to this operation.
Is it possible to vectorize this operation? Is there another way it can be solved?
Thanks.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
