'Generate a random variable by id in R
I want to create a random ID variable considering an actual ID. That means that observations with the same id must have the same random ID. Let me put an example:
id var1var2
1 a 1
5 g 35
1 hf 658
2 f 576
9 d 54546
2 dg 76
3 g 5
3 g 5
5 gg 56
6 g 456
8v g 6
9 e 778795
The expected result is:
id var1var2id random
1 a 1 9
5 g 35 1
1 hf 658 9
2 f 576 8
9 d 54546 3
2 dg 76 8
3 g 5 7
3 g 5 7
5 gg 56 1
6 g 456 5
8v g 6 4
9 e 778795 3
Solution 1:[1]
Here is a base R way with ave.
The random numbers are drawn between 1 and nrow(dat). Setting function sample argument size = 1 guarantees that all random numbers are equal by id.
set.seed(2022)
dat$random <- with(dat, ave(id, id, FUN = \(x) sample(nrow(dat), size = 1)))
Created on 2022-03-01 by the reprex package (v2.0.1)
Each id has only one random number.
split(data.frame(id = dat$id, random = dat$random), dat$id)
#> $`1`
#> id random
#> 1 1 4
#> 3 1 4
#>
#> $`2`
#> id random
#> 4 2 3
#> 6 2 3
#>
#> $`3`
#> id random
#> 7 3 7
#> 8 3 7
#>
#> $`5`
#> id random
#> 2 5 11
#> 9 5 11
#>
#> $`6`
#> id random
#> 10 6 4
#>
#> $`8v`
#> id random
#> 11 8v 6
#>
#> $`9`
#> id random
#> 5 9 12
#> 12 9 12
Created on 2022-03-01 by the reprex package (v2.0.1)
And the random numbers are uniformly distributed. Repeat the process above 10000 times, table the results and draw a bar plot to see it.
zz <- replicate(10000,
with(dat, ave(id, id, FUN = \(x) sample(nrow(dat), size = 1))))
barplot(table(as.integer(zz)))

Created on 2022-03-01 by the reprex package (v2.0.1)
Data
dat <- read.table(header = T, text = "id var1 var2
1 a 1
5 g 35
1 hf 658
2 f 576
9 d 54546
2 dg 76
3 g 5
3 g 5
5 gg 56
6 g 456
8v g 6
9 e 778795")
Created on 2022-03-01 by the reprex package (v2.0.1)
Solution 2:[2]
Just create a random group id for id and merge to the original data.
library(data.table)
library(tidyverse)
dt <- fread("
id var1 var2
1 a 1
5 g 35
1 hf 658
2 f 576
9 d 54546
2 dg 76
3 g 5
3 g 5
5 gg 56
6 g 456
8v g 6
9 e 778795
")
uq <- unique(dt$id)
set.seed(1)
uqid <- sample(1:length(unique(dt$id)), replace = F)
dt1 <- data.table(id = uq , random = uqid)
left_join(dt, dt1, by = "id" )
> left_join(dt, dt1, by = "id" )
id var1 var2 random
1: 1 a 1 1
2: 5 g 35 4
3: 1 hf 658 1
4: 2 f 576 7
5: 9 d 54546 2
6: 2 dg 76 7
7: 3 g 5 5
8: 3 g 5 5
9: 5 gg 56 4
10: 6 g 456 3
11: 8v g 6 6
12: 9 e 778795 2
It is like using a mapping table to create a new column but using join instead.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rui Barradas |
| Solution 2 |
