'left_join resulting in null values in R

I would like to combine data tables using a left_join by the 'Date' column. The resulting table merges the table, but the columns which joined to the data table are all NAs.

The statement used:

merge <- left_join(activity, sleep, by =c("Date" = "Date"))

Could this be an issue of Date formatting? The date format in 'activity' is %m/%d/%y and in 'sleep' is %m/%d/%y %I:%M:%S %p. The merged table's date format is %m/%d/%y.



Solution 1:[1]

  1. Convert to "real" Date-class and POSIXt-class data first.

    This is the recommended approach unless immediately after this join you know that you will not be looking at timestamps ever again. That is, if you even need "time between activities" or "counts per day" or something similar, you must really go with this option.

    First, convert all dates to Date and timestamps to POSIXt. I'm going to guess at the fields here but include a placeholder for you to add more if you have them.

    library(dplyr)
    # replace `c(Date)` with more if you have more, e.g., `c(Date, Date2, Date3)`
    activity <- activity %>%
      mutate(across(c(Date), ~ as.Date(., format = "%m/%d/%y")))
    # similarly, `c(Date)`
    sleep <- sleep %>%
      mutate(
        across(c(Date), ~ as.POSIXct(., format = "%m/%d/%y %I:%M:%S %p")),
        Date2 = as.Date(timestamp)
      )
    merged <- left_join(activity, sleep, by = c("Date" = "Date2"))
    

    I created Date2 here because R does not compare Dates equally with POSIXt variables. The rationale is unfortunately clear:

    Sys.Date() ; Sys.time()
    # [1] "2022-02-04"
    # [1] "2022-02-04 23:53:37 EST"
    as.numeric(Sys.Date()) ; as.numeric(Sys.time())
    # [1] 19027
    # [1] 1644036818
    

    By converting the POSIXt to Date fixes that portion.

  2. Substrings.

    Really, this is a hack, and only really useful if you intentionally will discard time/date from the rest of your analysis. (Even then, I feel hacky just recommending this ...)

    sleep %>%
      mutate(Date2 = substring(Date, 1, 8)) %>%
      left_join(activity, ., by = c("Date" = "Date2"))
    

Just to be sure ... your data has 2-digit years, right? If not, then change "%y" to "%Y", and substring(Date, 1, 8) to substring(Date, 1, 10).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 r2evans