'Function for checking data type for several data frames R

I would like to create a function, where the argument (input) would be unknown amount of data frames (could vary) and output is the data frame with data type for each column of data frames from the input.

Example: I have 2 data frames below (amount of data frames can vary, so I am not sure how to pass it as a function argument).


# Dataframe 1
kpi_id <- c("SL",  "OOS")
kpi_val <- c (1,2)

df1 <-  data.frame(kpi_id,   kpi_val)

> sapply(df1, class)

   kpi_id     kpi_val 
"character"   "numeric"

# Dataframe 2
kpi_id <- c("SL",  "OOS")
kpi_val <- c ("3", "4")

df2 <-  data.frame(kpi_id,   kpi_val)

> sapply(df2, class)
  kpi_id     kpi_val 
"character" "character"

I can get a result in a simple manner as below:

df_types1 <- as.data.frame(sapply(df1, class)) 
colnames(df_types)[1] <- deparse(substitute(df1))


df_types2 <- as.data.frame(sapply(df2, class)) 
colnames(df_types)[1] <- deparse(substitute(df2))


df_types3 <- bind_cols(df_types1, df_types2)

> df_types3
              df1       df2
kpi_id  character   character
kpi_val   numeric   character

How can I create a function where initial amount of data frames is unknown to get the same output?



Solution 1:[1]

Using rapply.

rapply(list(df1=df1, df2=df2), class, how='l') |>
  do.call(what='cbind')
#                 df1         df2        
# kpi_id  "character" "character"
# kpi_val "numeric"   "character"

If you get weird output due to multiple classes,

df1$date <- df2$date <- as.POSIXct(Sys.Date())

rapply(list(df1=df1, df2=df2), class, how='l') |>
  do.call(what='cbind')
#                df1         df2        
# kpi_id  "character" "character"
# kpi_val "numeric"   "character"
# date    character,2 character,2

you could use data.class which returns just the first one:

rapply(list(df1=df1, df2=df2), data.class, how='l') |>
  do.call(what='cbind')
#                df1         df2        
# kpi_id  "character" "character"
# kpi_val "numeric"   "character"
# date    "POSIXct"   "POSIXct"

Solution 2:[2]

Here is a function you can use; pass a list of data frames, whether that list is named, or unnamed:

df_types <- function(dfs) {
  do.call(
    rbind, 
    lapply(seq_along(dfs), function(d) {
        data.frame(
          df = ifelse(is.null(names(dfs)), rep(d,ncol(dfs[[d]])), names(dfs)[d]),
          col = names(dfs[[d]]),
          type=sapply(dfs[[d]],typeof),row.names = NULL)
      })
  )
}

Usage

df_types(list("a" = df1,"b" = df2))

Output:

  df     col      type
1  a  kpi_id character
2  a kpi_val    double
3  b  kpi_id character
4  b kpi_val character

Solution 3:[3]

Here is another option using tidyverse with the addition of using janitor and data.table to get it into the desired format:

library(tidyverse)

lst(df1, df2) %>%
  map_dfr(., ~ map_df(.x, class), .id = "var") %>%
  data.table::transpose(keep.names = "var") %>%
  janitor::row_to_names(1) %>%
  as_tibble() %>%
  column_to_rownames("var")

Output

              df1       df2
kpi_id  character character
kpi_val   numeric character

Solution 4:[4]

library(janitor)
compare_df_cols(df1, df2)
  column_name       df1       df2
1      kpi_id character character
2     kpi_val   numeric character

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 langtang
Solution 3 AndrewGB
Solution 4 Sam Firke