'Python, Numpy: all UNIQUE combinations of a numpy.array() vector

I want to get all unique combinations of a numpy.array vector (or a pandas.Series). I used itertools.combinations but it's very slow. For an array of size (1000,) it takes many hours. Here is my code using itertools (actually I use combination differences):

def a(array):
    temp = pd.Series([])
    for i in itertools.combinations(array, 2):
        temp = temp.append(pd.Series(np.abs(i[0]-i[1])))
    temp.index=range(len(temp))
    return temp

As you see there is no repetition!! The sklearn.utils.extmath.cartesian is really fast and good but it provides repetitions which I do not want! I need help rewriting above function without using itertools and much more speed for large vectors.



Solution 1:[1]

For a random array of ints:

import numpy as np
import pandas as pd
import itertools as it

b = np.random.randint(0, 8, ((6,)))
# array([7, 0, 6, 7, 1, 5])
pd.Series(list(it.combinations(np.unique(b), 2)))

it returns:

0    (0, 1)
1    (0, 5)
2    (0, 6)
3    (0, 7)
4    (1, 5)
5    (1, 6)
6    (1, 7)
7    (5, 6)
8    (5, 7)
9    (6, 7)
dtype: object

Solution 2:[2]

You could take the upper triangular part of a matrix formed on the Cartesian product with the binary operation (here subtraction, as in your example):

import numpy as np
n = 3
a = np.random.randn(n)
print(a)
print(a - a[:, np.newaxis])
print((a - a[:, np.newaxis])[np.triu_indices(n, 1)])

gives

[ 0.04248369 -0.80162228 -0.44504522]
[[ 0.         -0.84410597 -0.48752891]
 [ 0.84410597  0.          0.35657707]
 [ 0.48752891 -0.35657707  0.        ]]
[-0.84410597 -0.48752891  0.35657707]

with n=1000 (and output piped to /dev/null) this runs in 0.131s on my relatively modest laptop.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jaroslav Bezděk
Solution 2 Rory Yorke