'Excluding a set of tuples from a ranged random sample in python

I randomly sample N tuples from two different sets of numbers as follows:

set1 = [list(range(10))]
set2 = [list(range(10,20))]

c1 = np.random.choice(set1,N) #ex [9,3,7,8]
c2 = np.random.choice(set2,N) #ex [15,13,19,12]
tuples = np.concatenate([c1,c2],axis=1) #ex [[9,15],[9,19],[3,12]]

For the next iteration I want to sample c1,c2 again but excluding the unique tuples I already have. The numbers can appear again but just not the same combination of (number1,number2). Ideally that would be something like:

new_tuples = np.random.choice([set1,set2],exclude=tuples)

One could just check them with np.unique and resample but I was hopping for it to be a more efficient way.

EDIT: Getting all possible combinations beforehand will be to expensive.



Solution 1:[1]

After asker comment, I fixed the code:

import numpy as np
import time
begin = time.time()
N = 4
set1 = list(range(10**6))
set2 = list(range(10**6, 20**6))
c1 = np.random.choice(set1,N) #ex [9,3,7,8]
c2 = np.random.choice(set2,N)
tuples = set(list(zip(c1, c2)))
print(tuples)
c1 = np.random.choice(set1,N) #ex [9,3,7,8]
c2 = np.random.choice(set2,N)
new_tuples = set([(n1, n2) for (n1, n2) in list(zip(c1, c2)) if (n1, n2) not in tuples][0:4])
print(new_tuples)
print(tuples | new_tuples)
print(time.time() - begin)

Comments explain step by step. Tested 20 bilions, it returned in 13 seconds! Output obtained:

##{(315090, 13207382), (175935, 7922219), (249258, 59598185), (45681, 27246043)}

{(446782, 45042493), (122963, 12794175), (388061, 20418275), (328064, 48911155)} {(315090, 13207382), (175935, 7922219), (328064, 48911155), (446782, 45042493), (249258, 59598185), (45681, 27246043), (122963, 12794175), (388061, 20418275)} 12.917975664138794

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1