'Finding groups of consecutive X-values for a given Y-value in structured numpy array containing pairs of values?

I have a structured array in python using numpy

import numpy as np
struct_array = struct_array = np.array([(32, 250), (33,599), (33, 250),(50,231), (34, 250), (35, 250), (36, 250), (32,700),(37, 250), (30,891), (39,210),(40,250), (41,250), (67,250), (71,250), (72,250)], dtype=[('x', '<i4'), ('y', '<i4')])

In this structured array, I have:

  • X values that are integers, and are the first value in each pair. x = 32 in (32,250)
  • Y values that are integers, and are the second value in each. y = 250 in (32,250)

In this structured array, it is guaranteed that each x and y combination is unique. That is the only guarantee. There will only be one of (32,250), but there could be (32,370) or (21,250) for example, meaning that individual x and y values can recur, but the combination is unique.

For a given y value, I would need to determine the groups of consecutive x values.

From the above structured array, I should have:

  • Y = 250 : [array(32,33,34,35,36,37), array(40,41), array(67), array(71,72)]
  • Y = 599 : [array(33)]
  • Y = 231 : [array(50)]
  • Y = 700 : [array(32)]
  • Y = 210 : [array(39)]
  • Y = 891 : [array(30)]

where, for a given value of Y, I would need to know the groups of consecutive X values.

I have a function that will list consecutive groups:

def consecutive(data, stepsize=1):
    return np.split(data, np.where(np.diff(data) != stepsize)[0]+1)

which, if I run as follows:

consecutive(struct_array["x"])

I obtain

[array([32, 33]),
 array([33]),
 array([50]),
 array([34, 35, 36]),
 array([32]),
 array([37]),
 array([30]),
 array([39, 40, 41]),
 array([67]),
 array([71, 72])]

This consecutive function works well with just a 1-D array of integers. With this structured array, it just returns the X-values without considering the Y-values. I would need to group the Y-values.

So, I wrote a function such as the following:

net_dict = {}
for y_value in np.unique(struct_array["y"]):
    matching_indexes = np.where(struct_array["y"] == y_value)
    consec_result = consecutive(struct_array[matching_indexes]["x"])
    net_dict[y_value] = consec_result

Which provides the correct result of:

{210: [array([39])],
 231: [array([50])],
 250: [array([32, 33, 34, 35, 36, 37]),
  array([40, 41]),
  array([67]),
  array([71, 72])],
 599: [array([33])],
 700: [array([32])],
 891: [array([30])]}

However, I was wondering if there is a more efficient approach? There seems to be lots of tricks in numpy and was wondering if some trick could be applied here.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source