'Quantile in For Loop returning incorrect results

I am fairly new to R, not sure what's causing quantile code, to return two different results. When put in a for loop to calculate for all columns, it returns incorrect quantile values vs when run individually per column, the results are different (and correct).

Sample data:

set.seed(1)

dt_sample <- data.frame(
  group = rep(LETTERS[1:3], 10),
  var1 = rnorm(30),
  var2 = rnorm(30),
  var3 = rnorm(30)
)

Code for individual column:

var1_quantile <- dt_sample %>%
  group_by(group) %>%
  summarize(quant25 = quantile(var1, probs = 0), 
            quant50 = quantile(var1, probs = .25),
            quant75 = quantile(var1, probs = .5),
            quant100 = quantile(var1, probs = 1))

Results: group quant25 quant50 quant75 quant100 A -1.47 -0.542 0.221 1.60 B -2.21 -0.0461 0.129 1.51 C -1.99 -0.654 0.404 1.12

For Loop code, for all columns:

library(dplyr)

for(i in dt_sample[,c(2:4)]){
  
  loop1 <- dt_sample %>%
    group_by(group) %>%
    summarize(quant25 = quantile(i, probs = 0), 
              quant50 = quantile(i, probs = .25),
              quant75 = quantile(i, probs = .5),
              quant100 = quantile(i, probs = 1))
  
  print(loop1)
}

Results:

group quant25 quant50 quant75 quant100 A -2.21 -0.435 0.257 1.60 B -2.21 -0.435 0.257 1.60 C -2.21 -0.435 0.257 1.60

group quant25 quant50 quant75 quant100 A -1.38 -0.388 -0.0566 1.98 B -1.38 -0.388 -0.0566 1.98 C -1.38 -0.388 -0.0566 1.98

group quant25 quant50 quant75 quant100 A -1.80 -0.537 0.114 2.40 B -1.80 -0.537 0.114 2.40 C -1.80 -0.537 0.114 2.40

Column #2 is var1 which for group A is -1.47, -0.542, 0.221, 1.60 in individual calculation, but when added another column, it is -2.21, -0.435, 0.257, 1.60

Could anyone please help review? I inserted the same code in for loop parenthesis with "i" defined to pick all columns from df_1. What's causing this?



Solution 1:[1]

solution using data.table

library(data.table)

setDT(dt)

dt[, as.list(quantile(.SD, probs = c(0, .25, .5, 1), na.rm = T)), by = group]

just some sample data

set.seed(1)

dt <- data.frame(
  group = rep(LETTERS[1:3], 10),
  var1 = rnorm(30),
  var2 = rnorm(30),
  var3 = rnorm(30)
)

results

   group     0%     25%     50%  100%
1:     A -1.805 -0.3721 0.05117 2.402
2:     B -2.215 -0.1491 0.25658 1.980
3:     C -1.989 -0.6587 0.07718 1.125

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Merijn van Tilborg