'How to use select() inside between() inside filter() to subset data dplyr r
What I have: I have a large dataset with columns for a few measured variables, SiteID and Seconds.
What I would like to do:
For each of my study sites (SiteID), I would like to filter out rows for a specific time period (Seconds). I have the following code, which works only if the values in Seconds are unique to each study site, for example:
# create example data
data <- data.frame(SiteID = rep(c("A", "B", "C", "D", "E"), each = 10),
Seconds = rep(c(1:50)))
# filter the data
data.subset <- data %>%
filter(between(Seconds, 2, 8) | # for SiteID = "A"
filter(between(Seconds, 14, 16) | # for SiteID = "B"
filter(between(Seconds, 23, 30) | # for SiteID = "C"
filter(between(Seconds, 36, 39) | # for SiteID = "D"
filter(between(Seconds, 44, 49)) # for SiteID = "E"
The code above would give me what I want, which is:
SiteID Seconds
A 2
A 3
A 4
A 5
A 6
A 7
A 8
B 14
B 15
B 16
C 23
C 24
C 25
C 26
C 27
C 28
C 29
C 30
D 36
D 37
D 38
D 39
E 44
E 45
E 46
E 47
E 48
E 49
What problem I'm facing: The problem I have is that in my dataset, the range of Seconds for each SiteID are not unique as in the example above, but looks more like the following:
# create example data
data <- data.frame(SiteID = rep(c("A", "B", "C", "D", "E"), each = 10),
Seconds = rep(c(1:10)))
But I would like to subset the data as such: For site A, keep only 2-8 seconds. For site B, keep only 3-6 seconds. For site C, keep only 8-10 seconds. For site D, keep only 1-6 seconds. For site E, keep only 2-9 seconds.
I need help modifying the dplyr code above for me to do this. If I used the same code as above, the data set remains exactly the same and is not filtered the way I want it to because the numbers for each SiteID are in a similar range. I have tried using select() as the first term inside between() to filter out only the rows with a specific SiteID, but it is returning an error:
# modified code
data.subset <- data %>%
filter(between(select(data, Seconds & SiteID == "A"), 2, 8) |
filter(between(select(data, Seconds & SiteID == "B"), 3, 6) |
filter(between(select(data, Seconds & SiteID == "C"), 8, 10) |
filter(between(select(data, Seconds & SiteID == "D"), 1, 6) |
filter(between(select(data, Seconds & SiteID == "E"), 2, 9))
# error message:
Error: Problem with `filter()` input `..1`.
i Input `..1` is `between(...)`.
x Must subset columns with a valid subscript vector.
x Subscript has the wrong type `logical`.
i It must be numeric or character.
Basically if I were to use base R, I would need to do something like the following as the first term inside the between() function in place of just Seconds:
data$Seconds[data$SiteID == "A"]
I hope it is clear what I'm trying to do with dplyr. My desired outcome would look like this:
SiteID Seconds
A 2
A 3
A 4
A 5
A 6
A 7
A 8
B 3
B 4
B 5
B 6
C 8
C 9
C 10
D 1
D 2
D 3
D 4
D 5
D 6
E 2
E 3
E 4
E 5
E 6
E 7
E 8
E 9
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
