'R - Mean values of numbers in a sequence in a dataframe column
I am trying to calculate the mean of values in a column (y) that are being sequentially counted (1, 2, 3, etc.) within another column (x). An example dataframe is shown below.
> df
x y
1 1 15
2 2 20
3 4 16
4 5 12
5 6 17
6 8 14
7 9 13
8 10 19
I would like to get a vector result that will read out the mean values of numbers in a numerical sequence. The desired vector would read: 17.5 15 15.33333
I am not sure the best way to produce this desired result, but I have tried and failed to make a for loop using diff(df[,1]) to find the breakpoints.
Any help anyone could provide would be appreciated. This is a small example dataset, but the goal is to apply it to a large dataset.
Solution 1:[1]
Create a grouping column from the diff using cumsum on a logical vector
with(df, tapply(y, cumsum(c(TRUE, diff(x) != 1)), FUN = mean))
-output
1 2 3
17.50000 15.00000 15.33333
data
df <- structure(list(x = c(1L, 2L, 4L, 5L, 6L, 8L, 9L, 10L), y = c(15L,
20L, 16L, 12L, 17L, 14L, 13L, 19L)), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
Solution 2:[2]
Update: code reduced 2 lines:
df %>%
group_by(id_Group =cumsum(x-lag(x, default = x[1])>=2)) %>%
summarise(mean = mean(y, na.rm=TRUE)) %>%
pull(mean)
Here is a dplyr version:
- calculate the difference between the lagged
x - create group with
cumsum(diff>=2) - calculate mean and
pullthe vector.
library(dplyr)
df %>%
mutate(diff= x-lag(x, default = x[1])) %>%
group_by(id_Group =cumsum(diff>=2)) %>%
mutate(mean = mean(y, na.rm=TRUE)) %>%
slice(1) %>%
pull(mean)
[1] 17.50000 15.00000 15.33333
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | akrun |
| Solution 2 | TarJae |
