'Mysql SUM changes (to incorrect value) when adding ANY_VALUE to the select fields

I have a query, and I'm stumped as to why the SUM function is returning seemingly incorrect results when I add an additional (functionally dependent) field to the GROUP BY. Even when not changing the GROUP BY, and using ANY_VALUE, it still suddenly changes the SUM result.

The query is quite long, but for brevity, I'll only include the relevant part. I have not included cte which is a recursive query, but not relevant to the question. The example later on in, shows the full results of data regardless of what cte is.

select
cte.Id,
       YEAR(A.CreatedAt)                                                `Year`,
       MONTH(A.CreatedAt)                                               `Month`,
       (SELECT COALESCE(SUM(SR.AgencyProductionRevenue * SA.Split), 0)) `TotalRevenue`
from cte
         JOIN NetworkMembers `member` ON cte.Id = `member`.Id
         JOIN AccountTransactions A ON (
            `member`.AccountId = A.AccountId AND A.PaymentPeriodId is not null
        AND (A.CreatedAt > '2020-02-01')
    
    
    )
         LEFT JOIN SaleRevenue SR on A.SaleRevenueId = SR.Id
         LEFT JOIN SaleAgents SA
                   ON (SA.SaleId = SR.SaleId AND SA.NetworkMemberId = `member`.Id AND SA.SaleId is not null)
GROUP BY cte.Id, YEAR(A.CreatedAt), MONTH(A.CreatedAt);

This produces a result with a record:

Id Year Month TotalRevenue
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 -37.470000000000006

However, when I add ANY_VALUE(cte.ParentId), suddenly the resulting SUM for the same grouping is changed:

Id ParentId Year Month TotalRevenue
08d87138-676a-48b8-85c0-4692d914fdf2 NULL 2021 11 -1.2999999999999998

In my case, ParentId is functionally dependent on Id, I've even proved this by removing the GROUP BY and SUM clause and dumping out all records. For all records, whenever Id = 08d87138-676a-48b8-85c0-4692d914fdf2 then ParentId is NULL. I also tried adding ParentId to my GROUP BY:

GROUP BY cte.Id, cte.ParentId, YEAR(A.CreatedAt), MONTH(A.CreatedAt)

This produces the same incorrect sum in the second table above, even though ParentId is functionally dependent on Id.

To verify, thatSUM is wrong in the second table above, I can select ALL records with the following clause:

WHERE cte.Id = '08d87138-676a-48b8-85c0-4692d914fdf2' AND YEAR(A.CreatedAt) = 2021 AND MONTH(A.CreatedAt) = 11

Here is the full query:

select cte.Id,
       YEAR(A.CreatedAt)                                                `Year`,
       MONTH(A.CreatedAt)                                               `Month`,
       COALESCE(SR.AgencyProductionRevenue * SA.Split,0),
        cte.ParentId
    
from cte
         JOIN NetworkMembers `member` ON cte.Id = `member`.Id
         JOIN AccountTransactions A ON (
            `member`.AccountId = A.AccountId AND A.PaymentPeriodId is not null
        AND (A.CreatedAt > '2020-02-01')
    )
         LEFT JOIN SaleRevenue SR on A.SaleRevenueId = SR.Id
         LEFT JOIN SaleAgents SA
                   ON (SA.SaleId = SR.SaleId AND SA.NetworkMemberId = `member`.Id AND SA.SaleId is not null)
WHERE cte.Id = '08d87138-676a-48b8-85c0-4692d914fdf2' AND YEAR(A.CreatedAt) = 2021 AND MONTH(A.CreatedAt) = 11
GROUP BY cte.Id, cte.ParentId, YEAR(A.CreatedAt), MONTH(A.CreatedAt)

Here is the full results. Manually summing the COALESCE column gives the correct result of -37.470000000000006.

Id Year Month COALESCE(SR.AgencyProductionRevenue * SA.Split,0) ParentId
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 0 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 0 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 0 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 0 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 0 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 0 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 0 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 -0.09 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 -3.07 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 -0.61 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 -20.86 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 -0.22 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 -2.05 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 -0.29 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 -2.6 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 -0.19 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 -2.5 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 -0.34 NULL
08d87138-676a-48b8-85c0-4692d914fdf2 2021 11 -4.65 NULL

How come when I add ANY_VALUE(cte.ParentId), or GROUP BY cte.ParentId, the SUM becomes incorrect?

  1. I do not expect ANY_VALUE to have any effect on aggregates, since the GROUP BY fields do not change. In fact, I would expect that changing or adding a field in the select list should not have an impact on one of the other selected fields.
  2. GROUP BY cte.ParentId should not change the SUM since it is functionally dependent on Id.
  3. Replacing ANY_VALUE(cte.ParentId) with DEFAULT(cte.ParentId) produces the correct result with correct SUMS (but then the ParentId column is all null)

Can someone help explain this behavior?



Solution 1:[1]

I was able to resolve the issue, but I still don't fully understand why this worked.

I changed the last two LEFT JOINS to JOINS and then SUM worked correctly. So I assumed it might have been an issue with COALESCE since LEFT JOIN only affects the nullability of values.

I changed my expression from:

COALESCE(SUM(SR.AgencyProductionRevenue * SA.Split), 0) `TotalRevenue`

to:

SUM(COALESCE(SR.AgencyProductionRevenue, 0) * COALESCE(SA.Split, 0)) `TotalRevenue`

Now it works as expected.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Brad