'Find the row indexes of several values in a numpy array

I have an array X:

X = np.array([[4,  2],
              [9,  3],
              [8,  5],
              [3,  3],
              [5,  6]])

And I wish to find the index of the row of several values in this array:

searched_values = np.array([[4, 2],
                            [3, 3],
                            [5, 6]])

For this example I would like a result like:

[0,3,4]

I have a code doing this, but I think it is overly complicated:

X = np.array([[4,  2],
              [9,  3],
              [8,  5],
              [3,  3],
              [5,  6]])

searched_values = np.array([[4, 2],
                            [3, 3],
                            [5, 6]])

result = []

for s in searched_values:
    idx = np.argwhere([np.all((X-s)==0, axis=1)])[0][1]
    result.append(idx)

print(result)

I found this answer for a similar question but it works only for 1d arrays.

Is there a way to do what I want in a simpler way?



Solution 1:[1]

Another alternative is to use asvoid (below) to view each row as a single value of void dtype. This reduces a 2D array to a 1D array, thus allowing you to use np.in1d as usual:

import numpy as np

def asvoid(arr):
    """
    Based on http://stackoverflow.com/a/16973510/190597 (Jaime, 2013-06)
    View the array as dtype np.void (bytes). The items along the last axis are
    viewed as one value. This allows comparisons to be performed which treat
    entire rows as one value.
    """
    arr = np.ascontiguousarray(arr)
    if np.issubdtype(arr.dtype, np.floating):
        """ Care needs to be taken here since
        np.array([-0.]).view(np.void) != np.array([0.]).view(np.void)
        Adding 0. converts -0. to 0.
        """
        arr += 0.
    return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))

X = np.array([[4,  2],
              [9,  3],
              [8,  5],
              [3,  3],
              [5,  6]])

searched_values = np.array([[4, 2],
                            [3, 3],
                            [5, 6]])

idx = np.flatnonzero(np.in1d(asvoid(X), asvoid(searched_values)))
print(idx)
# [0 3 4]

Solution 2:[2]

The numpy_indexed package (disclaimer: I am its author) contains functionality for performing such operations efficiently (also uses searchsorted under the hood). In terms of functionality, it acts as a vectorized equivalent of list.index:

import numpy_indexed as npi
result = npi.indices(X, searched_values)

Note that using the 'missing' kwarg, you have full control over behavior of missing items, and it works for nd-arrays (fi; stacks of images) as well.

Update: using the same shapes as @Rik X=[520000,28,28] and searched_values=[20000,28,28], it runs in 0.8064 secs, using missing=-1 to detect and denote entries not present in X.

Solution 3:[3]

X = np.array([[4,  2],
              [9,  3],
              [8,  5],
              [3,  3],
              [5,  6]])

S = np.array([[4, 2],
              [3, 3],
              [5, 6]])

result = [[i for i,row in enumerate(X) if (s==row).all()] for s in S]

or

result = [i for s in S for i,row in enumerate(X) if (s==row).all()]

if you want a flat list (assuming there is exactly one match per searched value).

Solution 4:[4]

Here is a pretty fast solution that scales up well using numpy and hashlib. It can handle large dimensional matrices or images in seconds. I used it on 520000 X (28 X 28) array and 20000 X (28 X 28) in 2 seconds on my CPU

Code:

import numpy as np
import hashlib


X = np.array([[4,  2],
              [9,  3],
              [8,  5],
              [3,  3],
              [5,  6]])

searched_values = np.array([[4, 2],
                            [3, 3],
                            [5, 6]])

#hash using sha1 appears to be efficient
xhash=[hashlib.sha1(row).digest() for row in X]
yhash=[hashlib.sha1(row).digest() for row in searched_values]

z=np.in1d(xhash,yhash)  

##Use unique to get unique indices to ind1 results
_,unique=np.unique(np.array(xhash)[z],return_index=True)

##Compute unique indices by indexing an array of indices
idx=np.array(range(len(xhash)))
unique_idx=idx[z][unique]

print('unique_idx=',unique_idx)
print('X[unique_idx]=',X[unique_idx])

Output:

unique_idx= [4 3 0]
X[unique_idx]= [[5 6]
 [3 3]
 [4 2]]

Solution 5:[5]

Another way is to use cdist function from scipy.spatial.distance like this:

np.nonzero(cdist(X, searched_values) == 0)[0]

Basically, we get row numbers of X which have distance zero to a row in searched_values, meaning they are equal. Makes sense if you look on rows as coordinates.

Solution 6:[6]

I had similar requirement and following worked for me:

np.argwhere(np.isin(X, searched_values).all(axis=1))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 unutbu
Solution 2 seralouk
Solution 3 Julien
Solution 4
Solution 5 Georgy
Solution 6 Azhar Khan