'Quantile in For Loop returning incorrect results
I am fairly new to R, not sure what's causing quantile code, to return two different results. When put in a for loop to calculate for all columns, it returns incorrect quantile values vs when run individually per column, the results are different (and correct).
Sample data:
set.seed(1)
dt_sample <- data.frame(
group = rep(LETTERS[1:3], 10),
var1 = rnorm(30),
var2 = rnorm(30),
var3 = rnorm(30)
)
Code for individual column:
var1_quantile <- dt_sample %>%
group_by(group) %>%
summarize(quant25 = quantile(var1, probs = 0),
quant50 = quantile(var1, probs = .25),
quant75 = quantile(var1, probs = .5),
quant100 = quantile(var1, probs = 1))
Results: group quant25 quant50 quant75 quant100 A -1.47 -0.542 0.221 1.60 B -2.21 -0.0461 0.129 1.51 C -1.99 -0.654 0.404 1.12
For Loop code, for all columns:
library(dplyr)
for(i in dt_sample[,c(2:4)]){
loop1 <- dt_sample %>%
group_by(group) %>%
summarize(quant25 = quantile(i, probs = 0),
quant50 = quantile(i, probs = .25),
quant75 = quantile(i, probs = .5),
quant100 = quantile(i, probs = 1))
print(loop1)
}
Results:
group quant25 quant50 quant75 quant100 A -2.21 -0.435 0.257 1.60 B -2.21 -0.435 0.257 1.60 C -2.21 -0.435 0.257 1.60
group quant25 quant50 quant75 quant100 A -1.38 -0.388 -0.0566 1.98 B -1.38 -0.388 -0.0566 1.98 C -1.38 -0.388 -0.0566 1.98
group quant25 quant50 quant75 quant100 A -1.80 -0.537 0.114 2.40 B -1.80 -0.537 0.114 2.40 C -1.80 -0.537 0.114 2.40
Column #2 is var1 which for group A is -1.47, -0.542, 0.221, 1.60 in individual calculation, but when added another column, it is -2.21, -0.435, 0.257, 1.60
Could anyone please help review? I inserted the same code in for loop parenthesis with "i" defined to pick all columns from df_1. What's causing this?
Solution 1:[1]
solution using data.table
library(data.table)
setDT(dt)
dt[, as.list(quantile(.SD, probs = c(0, .25, .5, 1), na.rm = T)), by = group]
just some sample data
set.seed(1)
dt <- data.frame(
group = rep(LETTERS[1:3], 10),
var1 = rnorm(30),
var2 = rnorm(30),
var3 = rnorm(30)
)
results
group 0% 25% 50% 100%
1: A -1.805 -0.3721 0.05117 2.402
2: B -2.215 -0.1491 0.25658 1.980
3: C -1.989 -0.6587 0.07718 1.125
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Merijn van Tilborg |
