'filter rows by condition and add n rows before/after selected row
I have a grouped data and I want to select a rows that fulfill a certain condition (works with code provided below), but I also want to add n number of rows before or after that. The code below give me two rows per group and I now for example want to add one row that is before the first row and one row that is after the second row. I could not find a code or function "like add one row", so is there anyway that I can include this in the following code?
daf1 <- df %>%
setDT(df) %>%
dplyr::mutate(row_id = row_number()) %>%
dplyr::group_by(PATIENT.ID) %>%
dplyr::mutate(first_yes = cumsum(Caffeinefactor == "yes"),
last_no = (Caffeinefactor == "no") * row_number(),
last_no = ifelse(first_yes == 0, last_no, 0)) %>%
dplyr:: filter((first_yes == 1 & Caffeine >0) | last_no == max(last_no)) %>%
dplyr::select(-first_yes, -last_no)
DF = structure(list(PATIENT.ID = c(210625L, 210625L, 210625L, 210625L,
210625L, 210625L, 210625L, 210625L, 210625L, 210625L, 210625L,
210625L, 210625L, 210625L, 210625L, 210625L, 210625L, 220909L,
220909L, 220909L, 220909L, 220909L, 220909L, 220909L, 220909L,
220909L, 220909L, 221179L, 221179L, 221179L, 221179L, 221179L,
221179L, 221179L, 221179L, 221179L, 221179L, 221179L, 221179L,
221179L, 221179L, 301705L, 301705L, 301705L, 301705L, 301705L,
301705L, 301705L, 301705L, 301705L, 301705L, 301705L, 301705L,
301705L, 301705L, 301705L, 303926L, 303926L, 303926L, 303926L
), PATIENT.TREATMENT.NUMBER = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 13L, 17L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 12L, 13L, 14L, 15L, 16L, 1L, 2L, 3L, 4L), Caffeinefactor = c("no",
"no", "no", "no", "yes", "yes", "yes", "no", "yes", "yes", "yes",
"yes", "yes", "no", "no", "yes", "yes", "yes", "no", "yes",
"yes", "yes", "yes", "yes", "yes", "yes", "no", "no", "no", "no",
"no", "no", "no", "no", "no", "no", "yes", "yes", "yes", "yes",
"yes", "no", "no", "no", "no", "no", "no", "yes", "no", "yes",
"yes", "yes", "yes", "yes", "yes", "yes", "no", "no", "no", "no"
)), row.names = c(NA, -60L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x7fe7f7002ee0>)
What the output looks like now:
PATIENT.ID | Caffeinefactor header |
---|---|
210625L | no |
210625L | yes |
220909L | no |
220909L | yes |
301705L | no |
301705L | yes |
The output should look like this if one row is added before and after:
PATIENT.ID | Caffeinefactor header |
---|---|
210625L | no |
210625L | no |
210625L | yes |
210625L | yes |
220909L | no |
220909L | yes |
220909L | yes |
301705L | no |
301705L | no |
301705L | yes |
301705L | yes |
EDIT:
The original dataset has more that 90 columns and I also need to keep the values in these column. I do not want to duplicate the rows that the shown code selected.
To get to the out I have I changed my Data before with the following code:
groups <- Data2 %>%
setDT(Data2) %>%
dplyr::group_by(PATIENT.ID)
groups_we_want_removed <- Data2 %>%
setDT(Data2) %>%
dplyr::group_by(PATIENT.ID) %>%
dplyr::mutate(rownum = dplyr::row_number()) %>%
dplyr::filter((rownum == 1 & Caffeinefactor == "yes"))
remove_people_with_only_no_caffeinefactor <- Data2 %>%
setDT(Data2) %>%
dplyr::group_by(PATIENT.ID) %>%
dplyr::filter(Caffeinefactor == "yes")
desired_result <- groups %>%
filter(!PATIENT.ID %in% groups_we_want_removed$PATIENT.ID) %>%
filter(PATIENT.ID %in% remove_people_with_only_no_caffeinefactor$PATIENT.ID)
EDIT 2:
I know how I can theoretically select it. With this code it:
desired_result[which(desired_result$Caffeinefactor == "yes") + c(-1:1), ]
However, I need this code to be implemented after I matched for the two rows that I want, like in int he first code
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|