'make several dataframe with pattern
everyone. i hope your happiness. and I need your help for my hapiness
I made similar question less than a day ago. but i stuck in similar error this is the thing i should do data_01 is data frame with 2277 rows, 37 cols. my plan was split data_01 to several data frames (and remove data frames less than 100 rows).
data_01_00<-data_01 #family 2277
data_01_01<-data_01_00 %>% filter(rowSums(data_01_00[,1:39])==1 & data_01_00[,1]==1)
data_01_02<-data_01_00 %>% filter(rowSums(data_01_00[,1:2])==2 & data_01_00[,2]==1)
data_01_03<-data_01_00 %>% filter(rowSums(data_01_00[,1:3])==2 & data_01_00[,3]==1)
data_01_05<-data_01_00 %>% filter(rowSums(data_01_00[,1:5])==2 & data_01_00[,5]==1)
data_01_06<-data_01_00 %>% filter(rowSums(data_01_00[,1:6])==2 & data_01_00[,6]==1)
data_01_08<-data_01_00 %>% filter(rowSums(data_01_00[,1:8])==2 & data_01_00[,8]==1)
based on this pattern i tried this. No for loop because data_01_04 and data_01_07 is removed. so i decieded to use user function.
family<- vector(mode = "list", length = 40)
family[1]<-list(data_01_00)
family[2]<-list(data_01_01)
testfunc<-function(i){
family[i]<-data_01_00 %>% filter(paste0('rowSums(data_01_00[,1:',i,'])==2 & data_01_00[,',i,']==1'))
}
I faild. if there was nothing wrong, i would write codes
testfunc(3)
...
testfunc(8)
(actually, code should be devided into 39).
what should i do..?
Solution 1:[1]
If you are just filtering by column numbers, I don’t think there’s any need to complicate matters with paste0. You can just use base R for this and it will probably be faster than dplyr anyway. As a note, it’s always much easier to help if you provide sample data. Solutions will match your data that way. Below, I simulate some data for this problem. I've changed i to j as traditionally j refers to columns and i refers to rows.
set.seed(1)
data_01_00 <- matrix(sample(0:1, size = 99, prob = c(0.90, 0.1), replace = TRUE),
nrow = 2277, ncol = 39, byrow = TRUE)
testfunc <- function(j){
if (j == 1) family <- data_01_00[rowSums(data_01_00[ , 1:39]) == 1 & data_01_00[ , 1] == 1 , ]
else family <- data_01_00[rowSums(data_01_00[ , 1:j]) == 1 & data_01_00[ , j] == 1 , ]
return(family)
}
It’s probably easiest to use lapply and collect all your data.frames into a list. I’ve removed 4 & 7 as you’ve done.
x <- 1:39
x <- x[!x %in% c(4, 7)]
mydat <- lapply(x, testfunc)
length(mydat)
#> [1] 37
That way you can easily filter out those with less than 100 rows if you want.
mydat <- lapply(1:length(mydat), function(x) if(nrow(mydat[[x]]) >= 100) mydat[[x]])
mydat <- mydat[lengths(mydat) > 0]
length(mydat)
#> [1] 6
However, you can produce the data.frames separately if that’s what you need.
All this said, you could have used a loop in the same way I've used this function. 4 & 7 not being in the sequence isn't a limitation for the loop. You would just remove 4 & 7 from the sequence and loop over the sequence. It's probably better to use a function anyway for clarity and efficiency.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | TrainingPizza |
