'Efficient Numpy multiple sampling which results in a Matrix

I would like to create a 2d numpy matrix such that each row is a sampled draw from a bigger population (Without replacement).

I've created the following code snippet:

import numpy as np

full_population = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
number_of_iterations = 8
drawn_observations = 6
rng = np.random.default_rng()

for single_draw in range(number_of_iterations):
  indeces = rng.choice(a=full_population, size=drawn_observations, replace=False, shuffle=True)

However this code runs slowly (serially) comparing to my needs.

I've tried to look it up, this one seems close (But not exactly what i need) vectorized question

Note that the real length of full_population is 2m , number_of_iterations = 5000, and drawn_observations = 20k to 600k

Any help on that would be awesome!



Solution 1:[1]

Use random permutations after repeatedly tiling your full_population array:

repeats = np.tile(full_population, (number_of_iterations, 1))
permutations = rng.permuted(repeats, axis=1)
sample_array = permutations[:, :drawn_observations]

Should be much faster than the looping approach!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Charles Dupont