'Error: Data should be a matrix or data frame [duplicate]
FULL EDIT:
A script that have always worked has begun to give me this error:
Error: Data should be a matrix or data frame
Even though the data I'm using are in a Dataframe format
> is.data.frame(Dataset)
[1] TRUE
The error comes up when I try to do a For Loopto impute some missing values
for(i in 2:length(Dataset[-1])) {
Dataset[,i] <- impute(Dataset[,i], "random")
}
I've already tried to restart R,clear cache gc(), rm(list=ls()), restart my PC (multiple times,as my R does seem to have a problem with cache clearing), I even reinstalled Rstudio.
This is my script:
library(table1)
library(Hmisc)
library(tidyverse)
library(gmodels)
library(ggpubr)
library(reshape2)
library(ggplot2)
library(lares)
library(dplyr)
library(plyr)
Dataset <- read.csv(.........., sep=";")
names(Dataset)[1] <- "ID"
Dataset <- as.data.frame(Dataset)
set.seed(1)
for (i in 2:length(Dataset [-1])) {
Dataset[,i]<-impute(Dataset[,i], "random")
}
Error: Data should be a matrix or data frame
With one of your suggestion ( Dataset[,i] does not return a data.frame. Dataset[i] does), this is the result:
Error: Data should contain at least two columns
While with Across(everything()...) it does not impute anything, no errors, but viewing the data after running Across, it's the same as before..
And this is a sample of my dataset (i'm sorry this is the best I can do to import it):
ID Gender Age var1 var2 Var3 Var4 Var5
A 0 13 30 0 1 32.7 49.57
b 0 18 50 0 1 47.85
c 0 40 1 1 47.86 44.22
d 1 14 70 0 1 70.76
e 1 14 80 1 1 36.06 62.27
f 0 16 60 1 35.73 60.27
g 0 14 1 0 57.94 60.28
h 0 19 50 0 0 60.88 37.8
i 0 13 30 1 0 54.29 67.47
j 0 15 50 1 1 54.30 60.27
EXTRA: using mtcars
Dataset<- mtcars
set.seed(1)
for (i in 2:length(Dataset [-1])) {
Dataset[i]<-impute(Dataset[i], "random")
}
Error: Data should contain at least two columns
Solution 1:[1]
using dataframe Dataset from your edited question:
## to help others reproduce the issue:
Dataset <-
structure(list(ID = c("A", "b", "c", "d", "e", "f", "g", "h",
"i", "j"), Gender = c(0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L),
Age = c(13L, 18L, NA, 14L, 14L, 16L, 14L, 19L, 13L, 15L),
var1 = c(30L, 50L, 40L, 70L, 80L, 60L, NA, 50L, 30L, 50L),
var2 = c(0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L), Var3 = c(1L,
1L, 1L, 1L, 1L, NA, 0L, 0L, 0L, 1L), Var4 = c(32.7, 47.85,
47.86, 70.76, 36.06, 35.73, 57.94, 60.88, 54.29, 54.3), Var5 = c(49.57,
NA, 44.22, NA, 62.27, 60.27, 60.28, 37.8, 67.47, 60.27)), class = "data.frame", row.names = c(NA,
10L))
this is your input dataframe Dataset (note the NAs):
## > glimpse(Dataset)
## Rows: 10
## Columns: 8
## $ ID <chr> "A", "b", "c", "d", "e", "f", "g", "h", "i", "j"
## $ Gender <int> 0, 0, 0, 1, 1, 0, 0, 0, 0, 0
## $ Age <int> 13, 18, NA, 14, 14, 16, 14, 19, 13, 15
## $ var1 <int> 30, 50, 40, 70, 80, 60, NA, 50, 30, 50
## $ var2 <int> 0, 0, 1, 0, 1, 1, 1, 0, 1, 1
## $ Var3 <int> 1, 1, 1, 1, 1, NA, 0, 0, 0, 1
## $ Var4 <dbl> 32.70, 47.85, 47.86, 70.76, 36.06, 35.73, 57.94, 60.88, 54.29, ~
## $ Var5 <dbl> 49.57, NA, 44.22, NA, 62.27, 60.27, 60.28, 37.80, 67.47, 60.27
you need libraries {dplyr} and {Hmisc} (provides impute():
library(dplyr)
library(Hmisc)
now do the imputation across all columns (= everything()):
Dataset_imputed <- Dataset %>%
mutate(across(everything(),
function(x) impute(x)
)
)
... there you go:
## > glimpse(Dataset_imputed)
## Rows: 10
## Columns: 8
## $ ID <chr> "A", "b", "c", "d", "e", "f", "g", "h", "i", "j"
## $ Gender <int> 0, 0, 0, 1, 1, 0, 0, 0, 0, 0
## $ Age <impute> 13, 18, 14, 14, 14, 16, 14, 19, 13, 15
## $ var1 <impute> 30, 50, 40, 70, 80, 60, 40, 50, 30, 50
## $ var2 <int> 0, 0, 1, 0, 1, 1, 1, 0, 1, 1
## $ Var3 <impute> 1, 1, 1, 1, 1, 1, 0, 0, 0, 1
## $ Var4 <dbl> 32.70, 47.85, 47.86, 70.76, 36.06, 35.73, 57.94, 60.88, 54.29, ~
## $ Var5 <impute> 49.57, 49.57, 44.22, 62.27, 62.27, 60.27, 60.28, 37.80, 67.47, ~
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
