'R does not assign numbers to dates in the right order

I am working on a document were I have a list of tests with dates. I am trying to get R to pivot them horizontally, with the first test showing up first and the later tests showing up later. However, when applying functions such as sort() or order() or even group_by(), R still sometimes shows an earlier test in the first column pivotted to horizontal.

I would think I should apply some sort of odering to the date column before numbering, so that R numbers the actual first test with the first numerical value with which I am pivotting.

Any idea as to how I would go about this?

My dataframe looks like this:

employee nr.  date          date2        test_1   test_2  
x             2010/01/10    2010/01/05   positive positive 

.................................

It should be so that the 2 dates are switched. The date is formatted as yyyy/mm/dd. In the original dataset it was formatted as dd/mm/yy (you can see the format change in the code).

My expected output should look something like this:

employee nr. date date2 test_1 test_2
x 2010/01/05 2010/01/10 positive positive

#specify dates as variable "date" for R to recognize the variable

ct_clean$date <- as.Date(ct_clean$date, origin = "1899-30-12", format = "%d/%m/%y") 

###assign number to duplicate value of employee number (if multiple tests -> multiple entries)

ct_numbered <- ct_clean %>% group_by(employee) %>% mutate(test_nr = row_number())
ct_clean %>% group_by(employee) %>% mutate(test_nr = 1:n())
ct_clean %>% group_by(employee) %>% mutate(test_nr = seq_len(n()))
ct_clean %>% group_by(employee) %>% mutate(test_nr = seq_along(employee))              

#spread out multiple test for one individual horizontally

ct_wide <- ct_numbered %>% group_by(date) %>% pivot_wider(names_from = "test_nr", 
     values_from = "ct",
     names_expand = TRUE, names_vary = "slowest")   

 #merging rows to include the test-data and test-number in the same row 

 ct_df <- ct_wide %>%
 group_by(employee) %>%                                                         
 mutate(id = seq_along(employee)) %>% 
 pivot_wider(names_from = id, values_from = date, names_prefix = "date") %>%  
 summarize_all(list(~ .[!is.na(.)][1]))
r


Solution 1:[1]

You can do this by using if_else():

library(tidyverse)

d <- structure(list(employee = c("x", "y", "z"), date1 = structure(c(14619, 
14611, 14619), class = "Date"), date2 = structure(c(14614, 14614, 
14614), class = "Date"), test_1 = c("positive", "negative", "negative"
), test_2 = c("positive", "positive", "positive")), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L), spec = structure(list(
    cols = list(employee = structure(list(), class = c("collector_character", 
    "collector")), date1 = structure(list(format = ""), class = c("collector_date", 
    "collector")), date2 = structure(list(format = ""), class = c("collector_date", 
    "collector")), test_1 = structure(list(), class = c("collector_character", 
    "collector")), test_2 = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))
d
#> # A tibble: 3 × 5
#>   employee date1      date2      test_1   test_2  
#>   <chr>    <date>     <date>     <chr>    <chr>   
#> 1 x        2010-01-10 2010-01-05 positive positive
#> 2 y        2010-01-02 2010-01-05 negative positive
#> 3 z        2010-01-10 2010-01-05 negative positive

d |> 
  mutate(date1 = if_else(d$date1 < d$date2, d$date1, d$date2),
         date2 = if_else(d$date1 < d$date2, d$date2, d$date1), 
         test_1 = if_else(d$date1 < d$date2, d$test_1, d$test_2),
         test_2 = if_else(d$date1 < d$date2, d$test_2, d$test_1)
         )
#> # A tibble: 3 × 5
#>   employee date1      date2      test_1   test_2  
#>   <chr>    <date>     <date>     <chr>    <chr>   
#> 1 x        2010-01-05 2010-01-10 positive positive
#> 2 y        2010-01-02 2010-01-05 negative positive
#> 3 z        2010-01-05 2010-01-10 positive negative

Created on 2022-03-28 by the reprex package (v2.0.1)

Solution 2:[2]

I found the answer to my problem:

The argument had to be passed in the code for assigning numbers to the duplicates.

The original code looked like this:

ct_numbered <- ct_variant  %>% group_by(date, umcg) %>% mutate(test_nr = 
row_number())
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = 1:n())
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = seq_len(n()))
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = seq_along(umcg))

This is the solution I used:

ct_numbered <- ct_variant %>% arrange(ymd(ct_variant$date)) %>% group_by(date, 
umcg) %>% mutate(test_nr = row_number())
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = 1:n())
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = seq_len(n()))
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = seq_along(umcg))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 shs
Solution 2 Milan Post