'Python, Numpy: all UNIQUE combinations of a numpy.array() vector
I want to get all unique combinations of a numpy.array vector (or a pandas.Series). I used itertools.combinations but it's very slow. For an array of size (1000,) it takes many hours. Here is my code using itertools (actually I use combination differences):
def a(array):
temp = pd.Series([])
for i in itertools.combinations(array, 2):
temp = temp.append(pd.Series(np.abs(i[0]-i[1])))
temp.index=range(len(temp))
return temp
As you see there is no repetition!! The sklearn.utils.extmath.cartesian is really fast and good but it provides repetitions which I do not want! I need help rewriting above function without using itertools and much more speed for large vectors.
Solution 1:[1]
For a random array of ints:
import numpy as np
import pandas as pd
import itertools as it
b = np.random.randint(0, 8, ((6,)))
# array([7, 0, 6, 7, 1, 5])
pd.Series(list(it.combinations(np.unique(b), 2)))
it returns:
0 (0, 1)
1 (0, 5)
2 (0, 6)
3 (0, 7)
4 (1, 5)
5 (1, 6)
6 (1, 7)
7 (5, 6)
8 (5, 7)
9 (6, 7)
dtype: object
Solution 2:[2]
You could take the upper triangular part of a matrix formed on the Cartesian product with the binary operation (here subtraction, as in your example):
import numpy as np
n = 3
a = np.random.randn(n)
print(a)
print(a - a[:, np.newaxis])
print((a - a[:, np.newaxis])[np.triu_indices(n, 1)])
gives
[ 0.04248369 -0.80162228 -0.44504522]
[[ 0. -0.84410597 -0.48752891]
[ 0.84410597 0. 0.35657707]
[ 0.48752891 -0.35657707 0. ]]
[-0.84410597 -0.48752891 0.35657707]
with n=1000 (and output piped to /dev/null) this runs in 0.131s
on my relatively modest laptop.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Jaroslav Bezděk |
| Solution 2 | Rory Yorke |
