'Efficient Numpy multiple sampling which results in a Matrix
I would like to create a 2d numpy matrix such that each row is a sampled draw from a bigger population (Without replacement).
I've created the following code snippet:
import numpy as np
full_population = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
number_of_iterations = 8
drawn_observations = 6
rng = np.random.default_rng()
for single_draw in range(number_of_iterations):
indeces = rng.choice(a=full_population, size=drawn_observations, replace=False, shuffle=True)
However this code runs slowly (serially) comparing to my needs.
I've tried to look it up, this one seems close (But not exactly what i need) vectorized question
Note that the real length of full_population is 2m , number_of_iterations = 5000, and drawn_observations = 20k to 600k
Any help on that would be awesome!
Solution 1:[1]
Use random permutations after repeatedly tiling your full_population array:
repeats = np.tile(full_population, (number_of_iterations, 1))
permutations = rng.permuted(repeats, axis=1)
sample_array = permutations[:, :drawn_observations]
Should be much faster than the looping approach!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Charles Dupont |
