'Groupby & Sum - Create new column with added If Condition

I have the below DataFrame:

ID  Start           End              Variance
1   100000          120000           20000
1   1               0                -1
1   7815.58         7815.58          0
1   5261            5261             0
1   138783.2        89969.37         -48813.83
1   2459.92         2459.92          0
2   101421.99       93387.45         -8034.54
2   940.04          940.04           0
2   63.06           63.06            0
2   2454.86         2454.86          0
2   830             830              0
2   299             299              0
2   14000           12000            2000
2   1500            500              1000


I want to create a new column, Overspend Total. But I only want to sum the values that are greater than 0. The resulting DataFrame will look like this:

ID  Start           End              Variance        Overspend Total
1   100000          120000           20000           20000
1   1               0                -1              20000
1   7815.58         7815.58          0               20000
1   5261            5261             0               20000
1   138783.2        89969.37         -48813.83       20000
1   2459.92         2459.92          0               20000
2   101421.99       93387.45         -8034.54        3000
2   940.04          940.04           0               3000
2   63.06           63.06            0               3000
2   2454.86         2454.86          0               3000
2   830             830              0               3000
2   299             299              0               3000
2   14000           12000            2000            3000
2   1500            500              1000            3000

I tried the following

df['Overspend Variance'] = df[df['Variance'] > 0].groupby(df['ID']).transform('sum')

But I'm getting the below error:

ValueError: Wrong number of items passed 8, placement implies 1

I know df['Overspend Variance'] = df['Variance'].groupby(df['ID']).transform('sum') would work without a condition, but I can't figure out how to incorporate it with an extra condition.



Solution 1:[1]

We can use Series.where to replace the values that don't match the condition with NaN, then just groupby transform 'sum' since NaN values are ignored by 'sum' by default:

df['Overspend Total'] = (
    df['Variance'].where(df['Variance'] > 0).groupby(df['ID']).transform('sum')
)

Or explicitly replace with the additive identity (0) which will not affect the sum:

df['Overspend Total'] = (
    df['Variance'].where(df['Variance'] > 0, 0)
        .groupby(df['ID']).transform('sum')
)

Or with a lambda inside groupby transform:

df['Overspend Total'] = df.groupby('ID')['Variance'].transform(
    lambda s: s[s > 0].sum()
)

In any case df is:

    ID      Start        End  Variance  Overspend Total
0    1  100000.00  120000.00  20000.00          20000.0
1    1       1.00       0.00     -1.00          20000.0
2    1    7815.58    7815.58      0.00          20000.0
3    1    5261.00    5261.00      0.00          20000.0
4    1  138783.20   89969.37 -48813.83          20000.0
5    1    2459.92    2459.92      0.00          20000.0
6    2  101421.99   93387.45  -8034.54           3000.0
7    2     940.04     940.04      0.00           3000.0
8    2      63.06      63.06      0.00           3000.0
9    2    2454.86    2454.86      0.00           3000.0
10   2     830.00     830.00      0.00           3000.0
11   2     299.00     299.00      0.00           3000.0
12   2   14000.00   12000.00   2000.00           3000.0
13   2    1500.00     500.00   1000.00           3000.0

Solution 2:[2]

it could be done by filter the value less than 0 than group by and reassign

df = df.join(df[df.Variance>=0].groupby("ID")["Variance"].agg(sum),  on="ID", rsuffix="total")
df.columns = ["ID", "Start", "End", "Variance", "Overspend Total"]

    ID  Start   End Variance    Overspend Total
0   1   100000.00   120000.00   20000.00    20000.0
1   1   1.00    0.00    -1.00   20000.0
2   1   7815.58 7815.58 0.00    20000.0
3   1   5261.00 5261.00 0.00    20000.0
4   1   138783.20   89969.37    -48813.83   20000.0
5   1   2459.92 2459.92 0.00    20000.0
6   2   101421.99   93387.45    -8034.54    3000.0
7   2   940.04  940.04  0.00    3000.0
8   2   63.06   63.06   0.00    3000.0
9   2   2454.86 2454.86 0.00    3000.0
10  2   830.00  830.00  0.00    3000.0
11  2   299.00  299.00  0.00    3000.0
12  2   14000.00    12000.00    2000.00 3000.0
13  2   1500.00 500.00  1000.00 3000.0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Henry Ecker
Solution 2 galaxyan