'numpy take many samples with no replacement by row
I have a really big list. Imagine it looks something like this:
test = ['llama', 'cow', 'horse', 'fish', 'sheep', 'goat', 'cat', 'dog']
I want to sample out of this list many times. I want each sample to be taken without replacement. I want to avoid for loops in this case.
I've seen many solutions on StackOverflow that are close, but not exactly what I need here. Let's say each sample I wanted was to be of size 3. If I wanted to sample with replacement, this would work:
np.random.choice(test, size=(100, 3))
This would give me 100 rows with a sample of 3 in each row. The problem is that any particular row might have repeats, and I can't ask it to sample without replacement, because 300 > len(test).
Is there a way around this that maintains randomness? I saw potential solutions that use np.argsort, but I'm not sure that they're still actually random, considering sorting is being done.
Solution 1:[1]
Here's a vectorized approach with rand+argsort/argpartition trick from here -
idx = np.random.rand(100, len(test)).argpartition(3,axis=1)[:,:3]
out = np.take(test, idx)
Let's verify that all are unique per row with some pandas help -
In [51]: idx = np.random.rand(100, len(test)).argpartition(3,axis=1)[:,:3]
...: out = np.take(test, idx)
In [52]: import pandas as pd
In [53]: (pd.DataFrame(out).nunique(axis=1).values==3).all()
Out[53]: True
Solution 2:[2]
You can use random.sample for that, from the documentation:
Return a k length list of unique elements chosen from the population sequence. Used for random sampling without replacement.
And repeat the process n_times using a list comprehension:
n_times = 100
n_sample = 3
[random.sample(test, n_sample) for i in range(n_times)]
[['llama', 'goat', 'sheep'],
['cat', 'horse', 'dog'],
['sheep', 'dog', 'goat'],
['cat', 'cow', 'llama'],
['dog', 'fish', 'horse'],
['llama', 'horse', 'cow'],
['dog', 'goat', 'cow'],
['llama', 'cow', 'sheep'],
['fish', 'dog', 'horse'],
...
Solution 3:[3]
You could run np.random.choice without replacement one time for each row, and put the results in a matrix. That can be done with this command.
np.array([np.random.choice(test, 3, replace=False) for i in range(100)])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 | Atnas |
