'Split Numpy array by column value, while keeping track of row indexs

Say I have a numpy array:

Y.shape = (n, 3)

where n is the amount of rows in the numpy array.

I split Y based on the values of the second column by following this thread:

distances = [Y[Y[:, 1] == k] for k in np.unique(Y[:, 1])]

Distances is now a list of numpy arrays of N length, where N is the number of possible values in the second column. I create a loop to split each array in distances, repeating the above step, however splitting by the last column this time like so:

for idx, dist in enumerate(distances):    
  conditions = [dist[dist[:, 2] == k] for k in np.unique(dist[:, 2])]
  # Save conditions list and do something with it 

How in numpy can I get the row indexes of the oringal Y numpy array that correspond to each numpy array in conditions?



Solution 1:[1]

Assuming you're storing conditions in another list (I used all_conditions in my code), then this is a potential start-to-finish solution:

from functools import reduce
import operator

# The code you posted
distances = [Y[Y[:, 1] == k] for k in np.unique(Y[:, 1])]

# conditions are stored in this list
all_conditions = []
for idx, dist in enumerate(distances):
    conditions = [dist[dist[:, 2] == k] for k in np.unique(dist[:, 2])]
    all_conditions.append(conditions)

# This step flattens all_conditions so there are no nested lists.
all_conditions = reduce(operator.concat, list(all_conditions))

# For some reason, each row of 3 is within an extra bracket,
# so need to index the 0th element of each element in all_conditions.
# There is probably a more efficient way to extract them than a for loop,
# but this is the best I can come up with.

indices = np.zeros((len(all_conditions),3), dtype=int)
for i in range(len(all_conditions)):
    indices[i] = all_conditions[i][0]

# Select the values from X using the indices array as the indices.
selected = X[tuple(indices.T)]

Let me know if there's anything that needs clarification.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 AJH