'converting character to HMS - big data

I have a huge DF with 30mil rows. I'm importing it fast with the function

fread("")

Which made everything faster but imports HMS column as character

here is en example DF

employee <- c('John Doe','Peter Gynn','Jolie Hope')
timeWork <- c("00:00:00.239", "00:01:01.029", "00:00:01")
employ.data <- data.frame(employee, timeWork)

I'm able to transform it back to HMS using

employ.data$timeWork <- hms::as_hms(employ.data$timeWork)

but it takes a few minutes because of the data size.

I've heard about the function fastPOSIXct but I could not understand how to use it. whe I try:

employ.data$timeWork <- fasttime::fastPOSIXct(employ.data$timeWork)

It turns all my data to NA.

Is there any faster way? Thanks a lot

r


Solution 1:[1]

Unfortunately, while data.table-1.14.0 added speed improvements when reading in full datetime data, data.table#4841 suggests it is not implemented for ITime objects.

As an alternative, the vroom package provides a time parser as a column-type. While vroom alleges good speed in general, I have not benchmarked it against really large files.

txt <- '"employee","timeWork"
"John Doe","00:00:00.239"
"Peter Gynn","00:01:01.029"
"Jolie Hope","00:00:01"'

library(vroom)

types <- list(
  employee = col_character(),
  timeWork = col_time(format = "%H:%M:%OS")
)
out <- vroom(txt, delim = ",", col_types = types)

out
# # A tibble: 3 x 2
#   employee   timeWork   
#   <chr>      <drtn>     
# 1 John Doe    0.239 secs
# 2 Peter Gynn 61.029 secs
# 3 Jolie Hope  1.000 secs

str(out, give.attr = FALSE)
# spec_tbl_df [3 x 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
#  $ employee: chr [1:3] "John Doe" "Peter Gynn" "Jolie Hope"
#  $ timeWork: 'hms' num [1:3] 0.239 61.029 1

Note that the default format= is "%H:%M:%S" which, in this example, will not parse the decimal seconds correctly, so "%OS" is required (otherwise timeWork[1] is 0).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1