'How to set the type for each column from a row?

When I work with my own data, I like to creat a dataset in this way :

birthdate height sick color
date numeric logical list
Date of birth Weight (cm) Is the patient sick ? Hair color
20/03/1991 163 1 blue
10/11/1993 185 0 brown

Creating a new dataset

I like to keep the original dataset, and create a new one without the 2 first rows.

data_work <- data.table::copy(data.table::as.data.table(data))
data_work <- ds_work[i = -c(1:2)]
birthdate height sick color
20/03/1991 163 1 blue
10/11/1993 185 0 brown

Labels

Then, I can use the 2nd row of the original dataset to label every columns.

label_colname <- as.character(data[2,])
data_work <- sjlabelled::set_label(data_work, label = label_colname)

Col types

I would like to basically do the same and use the first row of the original dataset to control all of my columns types. Is it possible ?

Note : I work with data.table and possible col types include : "skip", "guess", "logical", "numeric", "date", "text" or "list"



Solution 1:[1]

Here's a general method for whatever "types" you may need.

The first premise is that we have unambiguous functions for each of the types you may have, namely

funcs <- list(
  guess   = function(z) type.convert(z, as.is = TRUE),
  skip    = identity,
  logical = function(z) !is.na(z) & tolower(z) %in% c("1", "true"),
  numeric = as.numeric,
  date    = function(z) as.Date(z, format = "%d/%m/%Y"),
  text    = as.character,
  list    = as.list)

Basic cleanup of the data to extract the types,

types <- unlist(data_work[1,])
labels <- unlist(data_work[2,])
data_work <- data_work[-(1:2),]
data_work
#     birthdate height   sick  color
#        <char> <char> <char> <char>
# 1: 20/03/1991    163      1   blue
# 2: 10/11/1993    185      0  brown

From here, we'll Map each column over its respective function:

cols <- names(data_work)[ types %in% names(funcs) ]
funs <- funcs[ types[ types %in% names(funcs) ] ]
data_work[, (cols) := Map(function(f, x) f(x), funs, .SD), 
           .SDcols = cols]
#     birthdate height   sick  color
#        <Date>  <num> <lgcl> <list>
# 1: 1991-03-20    163   TRUE   blue
# 2: 1993-11-10    185  FALSE  brown

Data

data_work <- structure(list(birthdate = c("date", "Date of birth", "20/03/1991", "10/11/1993"), height = c("numeric", "Weight (cm)", "163", "185"), sick = c("logical", "Is the patient sick ?", "1", "0"), color = c("list", "Hair color", "blue", "brown")), class = "data.frame", row.names = c(NA, -4L))
setDT(data_work)

Solution 2:[2]

data <- read.csv("data.csv", sep=",", header= TRUE)
data <- as.data.table(data)

when data is the data-table from your question including the first row (with the data types) after the header with column names, then the following will give you all columns with e.g. "date" in the first row as date_columns and then change the data type of these columns accordingly.

date_columns <- colnames(data)[grepl("date",data[1,])]
data[ ,(date_columns) := lapply(.SD, as.Date), .SDcols = date_columns]

just repeat these two steps for every type you want to detect (replace "date" in the code for the other data types). Afterward you can delete the first row where you specified the types from your data-table.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Flores