'How to create an indicator function based on existence of past occurrences?
I have a time-series panel dataset that is structured in the following way: There are multiple funds that each own multiple stocks and we have a value column for the stock. As you can see the panel is not balanced. My actual dataset is very large with each fund having at least 500 stocks and different quarters being represented with some having missing quarter values.
df <- data.frame(
fund_id = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2),
stock_id = c(1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,3,3,3,3),
year_q = c("2011-03","2011-06","2011-09","2011-12","2012-03","2012-06","2011-12","2012-03","2012-06","2012-09",
"2012-12","2013-03","2013-06","2014-09","2015-03","2013-03","2013-06","2013-09","2013-12"),
value = c(1,2,1,3,4,2,1,2,3,4,2,1,3,1,1,3,2,3,1)
)
> df
fund_id stock_id year_q value
1 1 1 2011-03 1
2 1 1 2011-06 2
3 1 1 2011-09 1
4 1 1 2011-12 3
5 1 1 2012-03 4
6 1 1 2012-06 2
7 1 2 2011-12 1
8 1 2 2012-03 2
9 1 2 2012-06 3
10 1 2 2012-09 4
11 1 2 2012-12 2
12 1 2 2013-03 1
13 1 2 2013-06 3
14 2 1 2014-09 1
15 2 1 2015-03 1
16 2 3 2013-03 3
17 2 3 2013-06 2
18 2 3 2013-09 3
19 2 3 2013-12 1
I would like to create an indicator function that puts a True value if a stock was present in that quarter or any of the past 3 quarters within that fund. Here is the result that I'm looking for:
result <- data.frame(
fund_id = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
year_q = c("2011-03","2011-06","2011-09","2011-12","2012-03","2012-06","2012-09","2012-12","2013-03","2013-06",
"2011-03","2011-06","2011-09","2011-12","2012-03","2012-06","2012-09","2012-12","2013-03","2013-06",
"2013-03","2013-06","2013-09","2013-12","2014-03","2014-06","2014-09","2014-12","2015-03","2013-03",
"2013-06","2013-09","2013-12","2014-03","2014-06","2014-09","2014-12","2015-03"),
stock_id = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,3,3,3,3,3,3,3,3,3),
Indicator = c(T,T,T,T,T,T,T,T,T,F,F,F,F,T,T,T,T,T,T,T,F,F,F,F,F,F,T,T,T,T,T,T,T,T,T,T,F,F)
)
fund_id year_q stock_id Indicator
1 1 2011-03 1 TRUE
2 1 2011-06 1 TRUE
3 1 2011-09 1 TRUE
4 1 2011-12 1 TRUE
5 1 2012-03 1 TRUE
6 1 2012-06 1 TRUE
7 1 2012-09 1 TRUE
8 1 2012-12 1 TRUE
9 1 2013-03 1 TRUE
10 1 2013-06 1 FALSE
11 1 2011-03 2 FALSE
12 1 2011-06 2 FALSE
13 1 2011-09 2 FALSE
14 1 2011-12 2 TRUE
15 1 2012-03 2 TRUE
16 1 2012-06 2 TRUE
17 1 2012-09 2 TRUE
18 1 2012-12 2 TRUE
19 1 2013-03 2 TRUE
20 1 2013-06 2 TRUE
21 2 2013-03 1 FALSE
22 2 2013-06 1 FALSE
23 2 2013-09 1 FALSE
24 2 2013-12 1 FALSE
25 2 2014-03 1 FALSE
26 2 2014-06 1 FALSE
27 2 2014-09 1 TRUE
28 2 2014-12 1 TRUE
29 2 2015-03 1 TRUE
30 2 2013-03 3 TRUE
31 2 2013-06 3 TRUE
32 2 2013-09 3 TRUE
33 2 2013-12 3 TRUE
34 2 2014-03 3 TRUE
35 2 2014-06 3 TRUE
36 2 2014-09 3 TRUE
37 2 2014-12 3 FALSE
38 2 2015-03 3 FALSE
Please note that in some cases the quarters are not successive and there might be a missing quarter. (if this is too difficult to deal with you can also ignore this condition)
Additionally, I would also like to create a value zero for a quarter if the stock that was present in any of the previous 3 quarters, no longer exists in that fund. (this is not that important though). I've been trying multiple loop solutions but since the data is massive it doesn't work that well. My ideal solution would be something using dplyr or datatable.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
