'Create a column in the original dataset to indicate whether the row was drawn in a random stratified sample
I would like to draw a stratified random sample (n = 375) from a dataset. Based on the stratified random sample, I would like to add a column to the original dataset indicating whether the row is in the stratified random sample (1) or not (0).
iris <- iris
# Get a random stratified sample
library(tidyverse)
stratified <- iris %>%
group_by(Species) %>%
sample_n(size=1)
# The final result I would like to get:
iris$sample3 <- 0
iris[21,6] <- 1
iris[65,6] <- 1
iris[106,6] <- 1
After doing that, I would like to repeat the procedure by drawing a second stratified random sample (n = 125) from my first stratified random sample (n = 375) and repeat the creation of a column.
Solution 1:[1]
You can add a column to your data frame that has the required number of 1s per group (and 0 otherwise).
set.seed(1)
samples <- 1
sample1 <- iris %>%
group_by(Species) %>%
mutate(sampled = as.numeric(row_number() %in% sample(n(), samples)))
sample1
sample1
#> # A tibble: 150 x 6
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species sampled
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 0
#> 2 4.9 3 1.4 0.2 setosa 0
#> 3 4.7 3.2 1.3 0.2 setosa 0
#> 4 4.6 3.1 1.5 0.2 setosa 1
#> 5 5 3.6 1.4 0.2 setosa 0
#> 6 5.4 3.9 1.7 0.4 setosa 0
#> 7 4.6 3.4 1.4 0.3 setosa 0
#> 8 5 3.4 1.5 0.2 setosa 0
#> 9 4.4 2.9 1.4 0.2 setosa 0
#> 10 4.9 3.1 1.5 0.1 setosa 0
#> # ... with 140 more rows
To get the sampled values, simply filter to find the 1s:
sample1 %>% filter(sampled == 1)
#> # A tibble: 3 x 6
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species sampled
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 4.6 3.1 1.5 0.2 setosa 1
#> 2 5.6 3 4.1 1.3 versicolor 1
#> 3 6.3 3.3 6 2.5 virginica 1
Created on 2022-05-16 by the reprex package (v2.0.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
