'If Many Columns Contains String then enter value for that row
I have a dataframe called bd that looks like that:
The dataframe is this Link Basedatos.
FAC_1 FAC_2 FAC_3 FAC_4 FAC_5 FAC_6 FAC_7 FAC_8 FAC_9 FAC_10 FAC_11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Substrato de mala calidad (muy meteorizado) " Naturaleza del suelo ~ " M~ NA NA NA NA NA NA NA NA
2 Substrato de mala calidad (muy meteorizado) " Alternancia de rocas ~ " N~ " M~ " M~ NA NA NA NA NA NA
3 NA NA NA NA NA NA NA NA NA NA NA
4 Substrato de mala calidad (muy meteorizado) " Alternancia de rocas ~ " R~ " O~ " P~ " M~ NA NA NA NA NA
5 Substrato de mala calidad (muy meteorizado) " Alternancia de rocas ~ " R~ " O~ " N~ " P~ NA NA NA NA NA
6 Substrato de mala calidad (muy meteorizado) " Alternancia de rocas ~ " R~ " P~ " M~ NA NA NA NA NA NA
7 Substrato de mala calidad (muy meteorizado) " Alternancia de rocas ~ " R~ " N~ " P~ " M~ NA NA NA NA NA
8 Substrato de mala calidad (muy meteorizado) " Alternancia de rocas ~ " M~ " P~ " M~ NA NA NA NA NA NA
9 Substrato de mala calidad (muy meteorizado) " Alternancia de rocas ~ " R~ " N~ " P~ " M~ NA NA NA NA NA
10 Substrato de mala calidad (muy meteorizado) " Rocas muy fracturadas~ " O~ " N~ " M~ " P~ NA NA NA NA NA
So I need to create 11 new columns, the first column have to search all the values that contains "Substrato" in the eleven FAC_ variables, then replace with "1" if contains the string and "0" if not contains the string; the second column have to search all the values that contains "Alternancia" in the eleven FAC_ variables, then replace with "1" if contains the string and "0" if not contains the string, and the same for all the rest of the columns.
To achieve this my code is the following:
The vector containers is the string that I have to find in the dataframe bd .
vect <- 1:11 #index vector
variables <- paste("FAC", vect, sep = "_") # variables names
containers <- c("Substrato","Alternancia", "Presencia", "fracturadas","desfavorable",
"Naturaleza", "Material", "Pendiente", "Morfología", "escacez", "Otro") # strings to find
bd$var1 <- character() #empty new column
# Create the first new column with "Substrato" like string:
for (i in 1:length(vect)){
out <- if_else(grepl(containers[1], bd[ ,i]), "1", "0")
bd$var1<- c(var1, out)
}
#For the next columns I changed containers[1] by containers[2],[3],[4],....[11] and var1 by var2,3,4,..11.
But I have a problem in the code that not create the variables, I checked many times but I can not solve the problem.
I am going to feel so happy is someone can help me. Thank so much!!
Solution 1:[1]
1. If you want a single number in var (1 or 0)
1 = the corresponding string from containers was found at least once in all of the eleven FAC_ variables in the same row
Using the tidyverse:
library(tidyverse)
for (i in seq_along(containers)){
bd <- bd %>%
mutate(!!sym(paste0("var",i)) := apply(bd[,1:11], 1, function(vec){str_detect(vec, containers[i]) %>% any(na.rm=T) %>% as.numeric}))
}
Output
> bd
# A tibble: 35,279 x 22
FAC_1 FAC_2 FAC_3 FAC_4 FAC_5 FAC_6 FAC_7 FAC_8 FAC_9 FAC_10 FAC_11 var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11
<chr> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Substra~ Natural~ "Morfol~ NA NA NA NA NA NA NA NA 1 0 0 0 0 1 0 0 0 0 0
2 Substra~ Alterna~ "Natura~ "Materi~ "Morfo~ NA NA NA NA NA NA 1 1 0 0 0 1 1 0 0 0 0
3 NA NA NA NA NA NA NA NA NA NA NA 0 0 0 0 0 0 0 0 0 0 0
4 Substra~ Alterna~ "Rocas ~ "Orient~ "Pendi~ "Morf~ NA NA NA NA NA 1 1 0 1 1 0 0 1 0 0 0
5 Substra~ Alterna~ "Rocas ~ "Orient~ "Natur~ "Pend~ NA NA NA NA NA 1 1 0 1 1 1 0 1 0 0 0
6 Substra~ Alterna~ "Rocas ~ "Pendie~ "Morfo~ NA NA NA NA NA NA 1 1 0 1 0 0 0 1 0 0 0
7 Substra~ Alterna~ "Rocas ~ "Natura~ "Pendi~ "Morf~ NA NA NA NA NA 1 1 0 1 0 1 0 1 0 0 0
8 Substra~ Alterna~ "Materi~ "Pendie~ "Morfo~ NA NA NA NA NA NA 1 1 0 0 0 0 1 1 0 0 0
9 Substra~ Alterna~ "Rocas ~ "Natura~ "Pendi~ "Morf~ NA NA NA NA NA 1 1 0 1 0 1 0 1 0 0 0
10 Substra~ Rocas m~ "Orient~ "Natura~ "Mater~ "Pend~ NA NA NA NA NA 1 0 0 1 1 1 1 1 0 0 0
# ... with 35,269 more rows
2. If you want a 1 or 0 for each of the eleven FAC_, and this in every var
I put the 1 and 0 in a list, so var1 is a vector of lists. Each element is a list with 11 numbers.
for (i in seq_along(containers)){
bd <- bd %>%
mutate(!!sym(paste0("var",i)) := apply(bd[,1:11], 1, function(vec){map(vec, ~grepl(containers[i], .) %>% as.numeric)}))
}
The calculation takes a bit of time but I do find
Output
> bd
# A tibble: 35,279 x 22
FAC_1 FAC_2 FAC_3 FAC_4 FAC_5 FAC_6 FAC_7 FAC_8 FAC_9 FAC_10 FAC_11 var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11
<chr> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <list> <list> <list> <list> <lis> <lis> <lis> <lis> <lis> <lis> <lis>
1 Substr~ Natura~ "Morfo~ NA NA NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
2 Substr~ Altern~ "Natur~ "Mater~ "Morfo~ NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
3 NA NA NA NA NA NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
4 Substr~ Altern~ "Rocas~ "Orien~ "Pendi~ "Morf~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
5 Substr~ Altern~ "Rocas~ "Orien~ "Natur~ "Pend~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
6 Substr~ Altern~ "Rocas~ "Pendi~ "Morfo~ NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
7 Substr~ Altern~ "Rocas~ "Natur~ "Pendi~ "Morf~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
8 Substr~ Altern~ "Mater~ "Pendi~ "Morfo~ NA NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
9 Substr~ Altern~ "Rocas~ "Natur~ "Pendi~ "Morf~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
10 Substr~ Rocas ~ "Orien~ "Natur~ "Mater~ "Pend~ NA NA NA NA NA <name~ <name~ <name~ <name~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~ <nam~
# ... with 35,269 more rows
> bd$var1[1]
[[1]]
[[1]]$FAC_1
[1] 1
[[1]]$FAC_2
[1] 0
[[1]]$FAC_3
[1] 0
[[1]]$FAC_4
[1] 0
[[1]]$FAC_5
[1] 0
[[1]]$FAC_6
[1] 0
[[1]]$FAC_7
[1] 0
[[1]]$FAC_8
[1] 0
[[1]]$FAC_9
[1] 0
[[1]]$FAC_10
[1] 0
[[1]]$FAC_11
[1] 0
Edit: if you want the whole string where we found the word in the cell
If you are certain each word can only appear once in the row, you can do
for (i in seq_along(containers)){
bd <- bd %>%
mutate(!!sym(paste0("var",i)) := apply(bd[,1:11], 1, function(vec){ifelse(str_detect(vec, containers[i]) %>% any(na.rm=T),
str_subset(vec, containers[i]),
NA)}))
}
with the value of the cells filled if the word was found, NA if the word was not found. If you want to change the value when the word was not found, modify NA in the ifelse function, for example with "Not found".
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
