'How to get a sorted cumulative array of values in numpy?
I have the following numpy arrays (which are actually a pandas column) which represent observations (a position and a value):
df['x'] = np.array([1, 2, 3, 2, 1, 1, 2, 3, 4, 5])
df['y'] = np.array([2, 1, 1, 1, 1, 1, 1, 1, 3, 2])
And instead, I would like to get the following two arrays:
[1 2 3 4 5]
[4 3 2 3 2]
Which is basically grouping all items with the same value in df['x'] and getting the cumulative sum of each value in df['y'], (or in other words getting the cumulative sum of values for each individual position).
Which is the most straightforward way to achieve that in numpy?
Solution 1:[1]
We can try
def groupby(a, b):
sidx = b.argsort(kind='mergesort')
a_sorted = a[sidx]
b_sorted = b[sidx]
cut_idx = np.flatnonzero(np.r_[True,b_sorted[1:] != b_sorted[:-1],True])
out = [sum(a_sorted[i:j]) for i,j in zip(cut_idx[:-1],cut_idx[1:])]
return out
groupby(df['y'].values,df['x'].values)
Out[223]: [4, 3, 2, 3, 2]
Notice the original function you can refer to Divakar 's answer (Thanks Divakar again :-), for teaching me bumpy)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | BENY |
