'R does not assign numbers to dates in the right order
I am working on a document were I have a list of tests with dates. I am trying to get R to pivot them horizontally, with the first test showing up first and the later tests showing up later. However, when applying functions such as sort() or order() or even group_by(), R still sometimes shows an earlier test in the first column pivotted to horizontal.
I would think I should apply some sort of odering to the date column before numbering, so that R numbers the actual first test with the first numerical value with which I am pivotting.
Any idea as to how I would go about this?
My dataframe looks like this:
employee nr. date date2 test_1 test_2
x 2010/01/10 2010/01/05 positive positive
.................................
It should be so that the 2 dates are switched. The date is formatted as yyyy/mm/dd. In the original dataset it was formatted as dd/mm/yy (you can see the format change in the code).
My expected output should look something like this:
employee nr. date date2 test_1 test_2
x 2010/01/05 2010/01/10 positive positive
#specify dates as variable "date" for R to recognize the variable
ct_clean$date <- as.Date(ct_clean$date, origin = "1899-30-12", format = "%d/%m/%y")
###assign number to duplicate value of employee number (if multiple tests -> multiple entries)
ct_numbered <- ct_clean %>% group_by(employee) %>% mutate(test_nr = row_number())
ct_clean %>% group_by(employee) %>% mutate(test_nr = 1:n())
ct_clean %>% group_by(employee) %>% mutate(test_nr = seq_len(n()))
ct_clean %>% group_by(employee) %>% mutate(test_nr = seq_along(employee))
#spread out multiple test for one individual horizontally
ct_wide <- ct_numbered %>% group_by(date) %>% pivot_wider(names_from = "test_nr",
values_from = "ct",
names_expand = TRUE, names_vary = "slowest")
#merging rows to include the test-data and test-number in the same row
ct_df <- ct_wide %>%
group_by(employee) %>%
mutate(id = seq_along(employee)) %>%
pivot_wider(names_from = id, values_from = date, names_prefix = "date") %>%
summarize_all(list(~ .[!is.na(.)][1]))
Solution 1:[1]
You can do this by using if_else():
library(tidyverse)
d <- structure(list(employee = c("x", "y", "z"), date1 = structure(c(14619,
14611, 14619), class = "Date"), date2 = structure(c(14614, 14614,
14614), class = "Date"), test_1 = c("positive", "negative", "negative"
), test_2 = c("positive", "positive", "positive")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L), spec = structure(list(
cols = list(employee = structure(list(), class = c("collector_character",
"collector")), date1 = structure(list(format = ""), class = c("collector_date",
"collector")), date2 = structure(list(format = ""), class = c("collector_date",
"collector")), test_1 = structure(list(), class = c("collector_character",
"collector")), test_2 = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
d
#> # A tibble: 3 × 5
#> employee date1 date2 test_1 test_2
#> <chr> <date> <date> <chr> <chr>
#> 1 x 2010-01-10 2010-01-05 positive positive
#> 2 y 2010-01-02 2010-01-05 negative positive
#> 3 z 2010-01-10 2010-01-05 negative positive
d |>
mutate(date1 = if_else(d$date1 < d$date2, d$date1, d$date2),
date2 = if_else(d$date1 < d$date2, d$date2, d$date1),
test_1 = if_else(d$date1 < d$date2, d$test_1, d$test_2),
test_2 = if_else(d$date1 < d$date2, d$test_2, d$test_1)
)
#> # A tibble: 3 × 5
#> employee date1 date2 test_1 test_2
#> <chr> <date> <date> <chr> <chr>
#> 1 x 2010-01-05 2010-01-10 positive positive
#> 2 y 2010-01-02 2010-01-05 negative positive
#> 3 z 2010-01-05 2010-01-10 positive negative
Created on 2022-03-28 by the reprex package (v2.0.1)
Solution 2:[2]
I found the answer to my problem:
The argument had to be passed in the code for assigning numbers to the duplicates.
The original code looked like this:
ct_numbered <- ct_variant %>% group_by(date, umcg) %>% mutate(test_nr =
row_number())
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = 1:n())
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = seq_len(n()))
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = seq_along(umcg))
This is the solution I used:
ct_numbered <- ct_variant %>% arrange(ymd(ct_variant$date)) %>% group_by(date,
umcg) %>% mutate(test_nr = row_number())
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = 1:n())
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = seq_len(n()))
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = seq_along(umcg))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | shs |
| Solution 2 | Milan Post |
