'python pandas version of Excel SUM + OFFSET function
Let's say input was
d = {'col1': [0,0,0,0,0,0,0,0,0,0],
'col2': [1,1,1,1,1,1,1,1,1,1],
'col3': [2,2,2,2,2,2,2,2,2,2],
'col4': [3,3,3,3,3,3,3,3,3,3],
'col5': [4,4,4,4,4,4,4,4,4,4],
'col6': [5,5,5,5,5,5,5,5,5,5],
'c_off': [0,1,2,3,0,1,2,3,0,1]}
df = pd.DataFrame(data=d)
ref_width = 2
I want to get an output column like this:
df['output'] = [1, 3, 5, 7, 1, 3, 5, 7, 1, 3]
Where the ref_width is the number of columns to sum over and c_off is how many columns to the right of col1 to start summing over.
For example, when c_off is 0, that's just summing first 2 columns col1 and col2. When c_off is 1, that's summing col2 and col3. When it's 3, sum col4 and col5.
Is there a vectorized way to do this?
Solution 1:[1]
Here's a vectorized solution that uses numpy broadcasting:
cols_to_process = 6
x = np.arange(cols_to_process) == df['c_off'].to_numpy()[:, None]
x += np.add.reduce([np.pad(x, ((0,0),(i,0)))[:, :-i] for i in range(1,ref_width)]).astype(bool)
df['output'] = df.iloc[:, :cols_to_process].to_numpy()[x].reshape(-1, ref_width).sum(axis=1)
Output:
>>> df
col1 col2 col3 col4 col5 col6 c_off output
0 0 1 2 3 4 5 0 1
1 0 1 2 3 4 5 1 3
2 0 1 2 3 4 5 2 5
3 0 1 2 3 4 5 3 7
4 0 1 2 3 4 5 0 1
5 0 1 2 3 4 5 1 3
6 0 1 2 3 4 5 2 5
7 0 1 2 3 4 5 3 7
8 0 1 2 3 4 5 0 1
9 0 1 2 3 4 5 1 3
Solution 2:[2]
try this:
df['output'] = df.agg(np.array, 1).apply(lambda x: x[x[-1]:-1][:ref_width].sum())
print(df)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | richardec |
| Solution 2 |
