'Estimating the percentage of common set elements over combined past periods in a panel
I have a time-series panel dataset that is structured in the following way: There are 2 funds that each own different stocks at each time period.
df <- data.frame(
fund_id = c(1,1,1,1,1,1,1,1, 1, 2,2,2,2),
time_Q = c(1,1,1,2,2,2,2,3, 3, 1,1,2,2),
stock_id = c("A", "B", "C", "A", "C", "D", "E", "D", "E", "A", "B", "B", "C")
)
> df
fund_id time_Q stock_id
1 1 1 A
2 1 1 B
3 1 1 C
4 1 2 A
5 1 2 C
6 1 2 D
7 1 2 E
8 1 3 D
9 1 3 E
10 2 1 A
11 2 1 B
12 2 2 B
13 2 2 C
For each fund, I would like to calculate the percentage of stocks held in that current time_Q that were also ever held in any of the previous one to 2 quarters. So basically for every fund and every time_Q, I would like to have 2 columns with past 1 time_Q, past 1-2 time_Q which show what percentage of stocks held in that time were also present in any of that past time_Qs. Here is what the result should look like:
result <- data.frame(
fund_id = c(1,1,1,2,2),
time_Q = c(1,2,3,1,2),
past_1Q = c("NA",0.5,1,"NA",0.5),
past_2Q = c("NA",0.5,1,"NA",0.5)
)
> result
fund_id time_Q past_1Q past_1_2Q
1 1 1 NA NA
2 1 2 0.5 0.5
3 1 3 1 1
4 2 1 NA NA
5 2 2 0.5 0.5
I already asked a similar question here, but now I'm looking for common elements across any of the past lagged periods. I'm looking for a dplyr or data.table scalable solution where I can have around 12 past quarters and deal with multiple funds and stocks and time periods.
Thanks in advance!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
