'Rolling mean, standard deviation in dbplyr

I want to set a new variable with rolling function (rolling mean, stdev...etc.) in dbplyr

Here is a database

library(odbc)
library(DBI)
library(tidyverse)
library(zoo)

con <- DBI::dbConnect(odbc::odbc(),
                      Driver    = "SQL Server", 
                      Server    = "xx.xxx.xxx.xxx",
                      Database  = "stock",
                      UID       = "userid",
                      PWD       = "userpassword")

startday = 20150101
day = tbl(con, in_schema("dbo", "LogDay")) 

enter image description here

I'd like to calculate rolling mean over 5 days, Here's my code, but it does not work

How can I solve this problem?

library(zoo)    
day %>% 
      mutate(ma5 = rollmean(priceClose, k = 5, fill = NA))

error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]키워드 'AS' 근처의 구문이  [Microsoft][ODBC SQL Server Driver][SQL Server]문을 준비할 수 
    <SQL> 'SELECT TOP 11 "logNo", "stockCode", "logDate", "priceOpen", "priceHigh", "priceLow", "priceClose", "adjRate", "volume", "amount", "numListed", "remark", "marketCap", "foreignRate", "personNetbuy", "foreignNetbuy", "instNetbuy", "financeNetbuy", "insuranceNetbuy", "toosinNetbuy", "bankNetbuy", "gitaFinanceNetbuy", "pensionNetbuy", "gitaInstNetbuy", "gitaForeignNetbuy", "samoNetbuy", "nationNetbuy", rollmean("priceClose", 5.0 AS "k", NULL AS "fill") AS "ma5"
    FROM "dbo"."LogDay"
    WHERE ("logDate" > 20150101.0)
    ORDER BY "stockCode"'
    Warning : 
    Named arguments ignored for SQL rollmean


Solution 1:[1]

The error is occurring because rollmean does not have a dbplyr translation defined nor is it an SQL command that can be used without translation. This is unsurprising as rollmean is part of the data.table library and dbplyr focuses on translating dplyr and base R commands.

Part of what you are after is a window function. dplyr has a range of window functions, as does SQL, but translation between these is not always straightforward. But there are ways to do this using commands that have translations defined.

Two possible approaches to consider:

(1) combining lag and lead

df %>%
  mutate(prev2_price = lag(priceClose, 2, order_by = date),
         prev1_price = lag(priceClose, 1, order_by = date),
         next1_price = lead(priceClose, 1, order_by = date),
         next2_price = lead(priceClose, 2, order_by = date)) %>%
  mutate(ma5 = (prev2_price + prev1_price + priceClose + next1_price + next2_price) / 5)

This approach will not scale well, but it is straightforward and easy to reason about. If you want to work within groups (e.g. separate moving averages for each stock) apply a group_by before using lag and lead.

(2) join and filter out records that are not wanted

df2 = df %>%
  select(stockCode, date, priceClose)

df %>%
  inner_join(df2, by = "stockCode", suffix = c("","_2") %>%
  filter(abs(date - date_2) <= 2) %>% # two records either side = window of width 5
  group_by(stockCode, date, priceClose) %>%
  summarise(ma5 = mean(priceClose_2)

This approach is much more general, but may be harder to reason about.

Solution 2:[2]

day = tbl(con, in_schema("dbo", "LogDay")) %>% filter(logDate > startday) %>% lazy_dt()

dayt = day %>% 
   group_by(stockCode) %>% 
   arrange(logDate) %>% 
   mutate(rise = (priceClose/lag(priceClose,1)-1)*100,
          candle = ifelse(priceClose > priceOpen, 1, 0),
          middle = ifelse(priceClose > (priceHigh + priceLow)/2, 1, 0),
          ma5 = rollmean(priceClose, k = 5, fill = NA, align = 'right'), 
          ovnprofit = lead(priceOpen,1)/priceClose,
          disparity = priceClose/ma5*100)

Solution 3:[3]

mgirlich's comment on this github issue (Request for time series function (lead, lag, rolling)) should be helpful here.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Simon.S.A.
Solution 2 Wookeun Lee
Solution 3 Richard Harrison