'Error: Data should be a matrix or data frame [duplicate]

FULL EDIT:

A script that have always worked has begun to give me this error:

Error: Data should be a matrix or data frame

Even though the data I'm using are in a Dataframe format

> is.data.frame(Dataset)
[1] TRUE

The error comes up when I try to do a For Loopto impute some missing values

for(i in 2:length(Dataset[-1])) {
  Dataset[,i] <- impute(Dataset[,i], "random")

}

I've already tried to restart R,clear cache gc(), rm(list=ls()), restart my PC (multiple times,as my R does seem to have a problem with cache clearing), I even reinstalled Rstudio.

This is my script:

library(table1)
library(Hmisc)
library(tidyverse)
library(gmodels)
library(ggpubr)
library(reshape2)
library(ggplot2)
library(lares)
library(dplyr)
library(plyr)

Dataset <- read.csv(.........., sep=";")

names(Dataset)[1] <- "ID"

Dataset <- as.data.frame(Dataset)


set.seed(1)

for (i in 2:length(Dataset [-1]))  { 
  
  Dataset[,i]<-impute(Dataset[,i], "random")
}

Error: Data should be a matrix or data frame

With one of your suggestion ( Dataset[,i] does not return a data.frame. Dataset[i] does), this is the result:

Error: Data should contain at least two columns

While with Across(everything()...) it does not impute anything, no errors, but viewing the data after running Across, it's the same as before..

And this is a sample of my dataset (i'm sorry this is the best I can do to import it):

ID  Gender  Age  var1   var2    Var3    Var4    Var5
A   0        13  30        0    1      32.7     49.57
b   0        18  50        0    1      47.85    
c   0            40        1    1      47.86    44.22
d   1        14  70        0    1      70.76
e   1        14  80        1    1      36.06    62.27
f   0        16  60        1           35.73    60.27
g   0        14            1    0      57.94    60.28
h   0        19  50        0    0      60.88    37.8
i   0        13  30        1    0      54.29    67.47
j   0        15  50        1    1      54.30    60.27

EXTRA: using mtcars

Dataset<- mtcars

set.seed(1)

for (i in 2:length(Dataset [-1]))  { 
  
  Dataset[i]<-impute(Dataset[i], "random")
}

Error: Data should contain at least two columns

r dataframe caching syntax-error gitlab gitlab-ci

Solution 1:^[1]

using dataframe Dataset from your edited question:

## to help others reproduce the issue:
Dataset <-
structure(list(ID = c("A", "b", "c", "d", "e", "f", "g", "h", 
"i", "j"), Gender = c(0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L), 
    Age = c(13L, 18L, NA, 14L, 14L, 16L, 14L, 19L, 13L, 15L), 
    var1 = c(30L, 50L, 40L, 70L, 80L, 60L, NA, 50L, 30L, 50L), 
    var2 = c(0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L), Var3 = c(1L, 
    1L, 1L, 1L, 1L, NA, 0L, 0L, 0L, 1L), Var4 = c(32.7, 47.85, 
    47.86, 70.76, 36.06, 35.73, 57.94, 60.88, 54.29, 54.3), Var5 = c(49.57, 
    NA, 44.22, NA, 62.27, 60.27, 60.28, 37.8, 67.47, 60.27)), class = "data.frame", row.names = c(NA, 
10L))

this is your input dataframe Dataset (note the NAs):

## > glimpse(Dataset)
## Rows: 10
## Columns: 8
## $ ID     <chr> "A", "b", "c", "d", "e", "f", "g", "h", "i", "j"
## $ Gender <int> 0, 0, 0, 1, 1, 0, 0, 0, 0, 0
## $ Age    <int> 13, 18, NA, 14, 14, 16, 14, 19, 13, 15
## $ var1   <int> 30, 50, 40, 70, 80, 60, NA, 50, 30, 50
## $ var2   <int> 0, 0, 1, 0, 1, 1, 1, 0, 1, 1
## $ Var3   <int> 1, 1, 1, 1, 1, NA, 0, 0, 0, 1
## $ Var4   <dbl> 32.70, 47.85, 47.86, 70.76, 36.06, 35.73, 57.94, 60.88, 54.29, ~
## $ Var5   <dbl> 49.57, NA, 44.22, NA, 62.27, 60.27, 60.28, 37.80, 67.47, 60.27

you need libraries {dplyr} and {Hmisc} (provides impute():

library(dplyr)
library(Hmisc)

now do the imputation across all columns (= everything()):

Dataset_imputed <- Dataset %>%
  mutate(across(everything(),
                function(x) impute(x)
                )
         )

... there you go:

## > glimpse(Dataset_imputed)
## Rows: 10
## Columns: 8
## $ ID     <chr> "A", "b", "c", "d", "e", "f", "g", "h", "i", "j"
## $ Gender <int> 0, 0, 0, 1, 1, 0, 0, 0, 0, 0
## $ Age    <impute> 13, 18, 14, 14, 14, 16, 14, 19, 13, 15
## $ var1   <impute> 30, 50, 40, 70, 80, 60, 40, 50, 30, 50
## $ var2   <int> 0, 0, 1, 0, 1, 1, 1, 0, 1, 1
## $ Var3   <impute> 1, 1, 1, 1, 1, 1, 0, 0, 0, 1
## $ Var4   <dbl> 32.70, 47.85, 47.86, 70.76, 36.06, 35.73, 57.94, 60.88, 54.29, ~
## $ Var5   <impute> 49.57, 49.57, 44.22, 62.27, 62.27, 60.27, 60.28, 37.80, 67.47, ~

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Error: Data should be a matrix or data frame [duplicate]

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]