'R: Group by and Apply a general function to two columns
Hi I'd like to groupby two dataframe columns, and apply a function to aother two dataframe columns. For e.g.,
ticker <- c("A", "A", 'A', "B", "B", "B")
date <- c(1,1,2,1,2,1)
ret <- c(1,2,4,6,9,5)
vol <- c(3,5,1,6,2,3)
dat <- data.frame(ticker,date,ret,vol)
For each ticker and each date, I'd like to calculate its PIN.
Now, to avoid further confusion, perhaps it helps to just speak out the actual function. YZ is a function in the InfoTrad package, and YZ only accepts a dataframe with two columns. It uses some optimisation tool and returns an estimated PIN.
install.packages(InfoTrad)
library(InfoTrad)
get_pin_yz <- function(data) {
return(YZ(data[ ,c('volume_krw_buy', 'volume_krw_sell')])[['PIN']])
}
I know how to do this in R using for loop. But for loop is very computationally costly, and it might take weeks to finish running my large dataset. Thus, I would like to ask how to do this using groupby.
# output format is wide wrt long format as "dat"
dat_w <- data.frame(ticker = NA, date = NA, PIN = NA)
for (j in c("A", "B")){
for (k in c(1:2)){
subset <- dat %>% subset((ticker == j & date == k), select = c('ret', "vol"))
new_row <- data.frame(ticker = j, date = k, PIN = YZ(subset)$PIN)
dat_w <- rbind(dat_w, new_row)
}
}
dat_w <- dat_w[-1, ]
dat_w
Don't know if this can help you help me -- I know how to do this in python: I just write a function and run df.groupby(['ticker','date']).apply(function).
Finally, the wanted dataframe is:
ticker <- c('A','A','B','B')
date <- c(1,2,1,2)
PIN <- c(1.05e-17,2.81e-09,1.12e-08,5.39e-09)
data.frame(ticker,date,PIN)
Could somebody help out, please?
Thank you!
Best,
Darcy
Previous stuff (Feel free to ignore) Previously, I wrote this: My function is:
get_rv <- function(data) {
return(data[['vol']] + data[['ret']])
}
What I want is:
ticker_wanted <- c('A','A', 'B', 'B')
date_wanted <- c(1,2,1,2)
rv_wanted <- c(7,5,10,11)
df_wanted <-data.frame(ticker_wanted,date_wanted,rv_wanted)
But this is not literally what my actual function is. The vol+ret is just an example. I'm more interested in the more general case: how to groupby and apply a general function to two or more dataframes. I use the vol + ret just because I didn't want to bother others by asking them to install some potentially irrelevant package on their PC.
Solution 1:[1]
you could just do this? (with summarise as an example of your function):
ticker <- c("A", "A", 'A', "B", "B", "B")
date <- c(1,1,2,1,2,1)
ret <- c(1,-2,4,6,9,-5)
vol <- c(3,5,1,6,2,3)
df <- data.frame(ticker,date,ret,vol)
df_wanted <- get_rv(df)
get_rv <- function(data){
result <- data %>%
group_by(ticker,date) %>%
summarise(rv =sum(ret) + sum(vol)) %>%
as.data.frame()
names(result) <- c('ticker_wanted', 'date_wanted', 'rv_wanted')
return(result)
}
Solution 2:[2]
Assuming that your dataframe is as follows:
data <- data.frame(ticker,date,ret,vol)
Use split to split your dataframe into a group of dataframes bases on the values of ticker, and date.
dflist = split(data, f = list(data$ticker, data$date), drop = TRUE)
Now use lapply or sapply to run the function YZ() on each dataframe member of dflist.
pins <- lapply(dflist, function(x) YZ(x)$PIN)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | samsmit |
| Solution 2 | Friedrich Nietzsche |
