'Optimize random with filter and map function
I want to randomly retrieve a list of cars mpg based on some predefined fuel type. Here is the code that works but slows down the processing. Is there a better way to apply this principle in a data volume containing a million rows?
list_carbs <- c(1,3,4,4)
get_sample_cars <- function (list_carbs){
filtered_cars <- map(list_carbs, ~mtcars %>% filter(carb ==.x))
res <- map(filtered_cars, ~sample(.x$mpg, size=1))
}
mpg_cars <- get_sample_cars(list_carbs)
here are two examples of expected list results:
mpg carb
27.3 1
16.4 3
19.2 4
10.4 4
mpg carb
32.4 1
17.3 3
19.2 4
14.7 4
Solution 1:[1]
filter(mtcars, carb %in% list_carbs) %>%
group_by(carb) %>%
slice_sample(n = 1)
# A tibble: 3 x 11
# Groups: carb [3]
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1
2 16.4 8 276. 180 3.07 4.07 17.4 0 0 3 3
3 13.3 8 350 245 3.73 3.84 15.4 0 0 3 4
EDIT:
mtcars %>%
select(carb, mpg) %>%
nest_by(carb) %>%
filter(carb %in% list_carbs) %>%
mutate(data = map2(data, table(list_carbs)[as.character(carb)],
~sample(.x,.y)))%>%
unnest(data)
# A tibble: 4 x 2
# Groups: carb [3]
carb data
<dbl> <dbl>
1 1 22.8
2 3 16.4
3 4 14.3
4 4 14.7
Solution 2:[2]
you can probably simplify your code just using this:
mpg_cars <- sample(mtcars$mpg[carb %in% list_carb], size = 3)
that is to say, you can filter your desired column by slicing data in any way you want and sample from the remaining filtered data.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
