'How to get a sorted cumulative array of values in numpy?

I have the following numpy arrays (which are actually a pandas column) which represent observations (a position and a value):

df['x'] = np.array([1, 2, 3, 2, 1, 1, 2, 3, 4, 5])
df['y'] = np.array([2, 1, 1, 1, 1, 1, 1, 1, 3, 2])

And instead, I would like to get the following two arrays:

[1 2 3 4 5]
[4 3 2 3 2]

Which is basically grouping all items with the same value in df['x'] and getting the cumulative sum of each value in df['y'], (or in other words getting the cumulative sum of values for each individual position).

Which is the most straightforward way to achieve that in numpy?



Solution 1:[1]

We can try

def groupby(a, b):
    sidx = b.argsort(kind='mergesort')
    a_sorted = a[sidx]
    b_sorted = b[sidx]
    cut_idx = np.flatnonzero(np.r_[True,b_sorted[1:] != b_sorted[:-1],True])
    out = [sum(a_sorted[i:j]) for i,j in zip(cut_idx[:-1],cut_idx[1:])]
    return out


groupby(df['y'].values,df['x'].values)
Out[223]: [4, 3, 2, 3, 2]

Notice the original function you can refer to Divakar 's answer (Thanks Divakar again :-), for teaching me bumpy)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 BENY