'Optimize random with filter and map function

I want to randomly retrieve a list of cars mpg based on some predefined fuel type. Here is the code that works but slows down the processing. Is there a better way to apply this principle in a data volume containing a million rows?

list_carbs <- c(1,3,4,4)

get_sample_cars <- function (list_carbs){
  filtered_cars <- map(list_carbs, ~mtcars %>% filter(carb ==.x))

  res <- map(filtered_cars, ~sample(.x$mpg, size=1))
}

mpg_cars <- get_sample_cars(list_carbs)

here are two examples of expected list results:

mpg    carb
27.3    1
16.4    3
19.2    4
10.4    4

mpg    carb
32.4    1
17.3    3
19.2    4
14.7    4
r


Solution 1:[1]

filter(mtcars, carb %in% list_carbs) %>%
   group_by(carb) %>%
   slice_sample(n = 1)

# A tibble: 3 x 11
# Groups:   carb [3]
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1
2  16.4     8 276.    180  3.07  4.07  17.4     0     0     3     3
3  13.3     8 350     245  3.73  3.84  15.4     0     0     3     4

EDIT:

mtcars %>%
  select(carb, mpg) %>%
  nest_by(carb) %>%
  filter(carb %in% list_carbs) %>%    
  mutate(data = map2(data, table(list_carbs)[as.character(carb)], 
                         ~sample(.x,.y)))%>%
  unnest(data)

# A tibble: 4 x 2
# Groups:   carb [3]
   carb  data
  <dbl> <dbl>
1     1  22.8
2     3  16.4
3     4  14.3
4     4  14.7

Solution 2:[2]

you can probably simplify your code just using this:

mpg_cars <- sample(mtcars$mpg[carb %in% list_carb], size = 3)

that is to say, you can filter your desired column by slicing data in any way you want and sample from the remaining filtered data.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2