'converting character to HMS - big data
I have a huge DF with 30mil rows. I'm importing it fast with the function
fread("")
Which made everything faster but imports HMS column as character
here is en example DF
employee <- c('John Doe','Peter Gynn','Jolie Hope')
timeWork <- c("00:00:00.239", "00:01:01.029", "00:00:01")
employ.data <- data.frame(employee, timeWork)
I'm able to transform it back to HMS using
employ.data$timeWork <- hms::as_hms(employ.data$timeWork)
but it takes a few minutes because of the data size.
I've heard about the function fastPOSIXct but I could not understand how to use it.
whe I try:
employ.data$timeWork <- fasttime::fastPOSIXct(employ.data$timeWork)
It turns all my data to NA.
Is there any faster way? Thanks a lot
Solution 1:[1]
Unfortunately, while data.table-1.14.0 added speed improvements when reading in full datetime data, data.table#4841 suggests it is not implemented for ITime objects.
As an alternative, the vroom package provides a time parser as a column-type. While vroom alleges good speed in general, I have not benchmarked it against really large files.
txt <- '"employee","timeWork"
"John Doe","00:00:00.239"
"Peter Gynn","00:01:01.029"
"Jolie Hope","00:00:01"'
library(vroom)
types <- list(
employee = col_character(),
timeWork = col_time(format = "%H:%M:%OS")
)
out <- vroom(txt, delim = ",", col_types = types)
out
# # A tibble: 3 x 2
# employee timeWork
# <chr> <drtn>
# 1 John Doe 0.239 secs
# 2 Peter Gynn 61.029 secs
# 3 Jolie Hope 1.000 secs
str(out, give.attr = FALSE)
# spec_tbl_df [3 x 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
# $ employee: chr [1:3] "John Doe" "Peter Gynn" "Jolie Hope"
# $ timeWork: 'hms' num [1:3] 0.239 61.029 1
Note that the default format= is "%H:%M:%S" which, in this example, will not parse the decimal seconds correctly, so "%OS" is required (otherwise timeWork[1] is 0).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
