'R Avoid rowwise() and looking for faster alternative
I want to merge two vectors into one dataset and integrate it with the function mutate as 5 new columns into the already existing dataset.
Here is my example code:
vector1<-list(c("Reply","Reshare","Like","Share","Search"),c("Reply","Reshare","Like","Share","Search"),c("Reply","Reshare","Like","Share","Search"))
vector2<-list(c(1,2,6,3,4),c(3,7,9,2,4),c(5,2,8,4,0))
tibble(vector1=vector1,
vector2=vector2)%>%
rowwise()%>%
mutate(vector2|> set_names(vector1)|> as.list()|> data.frame())
# A tibble: 3 x 7
# Rowwise:
vector1 vector2 Reply Reshare Like Share Search
<list> <list> <dbl> <dbl> <dbl> <dbl> <dbl>
1 <chr [5]> <dbl [5]> 1 2 6 3 4
2 <chr [5]> <dbl [5]> 3 7 9 2 4
3 <chr [5]> <dbl [5]> 5 2 8 4 0
This works quite well so far. However, I have a very large dataset and the rowwise() solution is very time consuming.
If I omit the rowwise() function I get an error message.
I think the error is due to the fact that I transform the vectors as a list (as.list()). The mutate function for the data set does not seem to be able to handle this.
The rowwise() function should be omitted and only the code in the mutate function should be changed.
Can anyone help me and provide a faster solution?
Solution 1:[1]
If vector1 has the same values (and in the same order) always like in the example we can do this in base R in a more simpler way.
do.call(rbind, vector2) |>
as.data.frame() |>
setNames(vector1[[1]])
# Reply Reshare Like Share Search
#1 1 2 6 3 4
#2 3 7 9 2 4
#3 5 2 8 4 0
Solution 2:[2]
I suggest to use mapply
library(dplyr)
library(magrittr)
tibble(vector1=vector1,
vector2=vector2) %>%
mutate(mapply(set_names, vector2, vector1, SIMPLIFY = FALSE) %>%
do.call(rbind, .) %>%
data.frame())
# A tibble: 3 × 7
vector1 vector2 Reply Reshare Like Share Search
<list> <list> <dbl> <dbl> <dbl> <dbl> <dbl>
1 <chr [5]> <dbl [5]> 1 2 6 3 4
2 <chr [5]> <dbl [5]> 3 7 9 2 4
3 <chr [5]> <dbl [5]> 5 2 8 4 0
Here is a benchmark that compares rowwise against mapply with vectors of length 100 and shuffled labels
vector1 <- replicate(sample(c("Reply","Reshare","Like","Share","Search"),
5,
replace = FALSE),
n = 100,
simplify = FALSE)
vector2 <- replicate(rnorm(5), n= 100, simplify = FALSE)
tb <- tibble(vector1 = vector1, vector2 = vector2)
microbenchmark(mapply = tb |>
mutate(mapply(set_names, vector2, vector1, SIMPLIFY = FALSE) %>%
do.call(rbind, .) |>
data.frame()),
rowwise = tb %>%
rowwise()%>%
mutate(vector2|> set_names(vector1)|> as.list()|> data.frame()))
+ Unit: milliseconds
expr min lq mean median uq max neval
mapply 2.439877 2.487191 2.630114 2.512208 2.576073 5.990312 100
rowwise 37.309123 37.775255 39.386047 38.193196 41.221624 44.088820 100
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ronak Shah |
| Solution 2 |
