'Compare AVG of rows in a group with all rows that are NOT in the group (BigQuery)
I have a data set that looks like this:
| date | grp_name | uid | value_a | value_b | value_c |
|---|---|---|---|---|---|
| 2022-01-01 | A | 1 | 1 | 10 | 5 |
| 2022-01-01 | B | 2 | 7 | 1 | 20 |
| 2022-01-01 | C | 10 | 7 | 3 | 20 |
| 2022-01-01 | A | 3 | 3 | 12 | 4 |
| 2022-01-02 | B | 2 | 6 | 1 | 21 |
| 2022-01-02 | B | 5 | 3 | 4 | 19 |
| 2022-01-03 | A | 6 | 1 | 15 | 6 |
| 2022-01-03 | C | 7 | 8 | 2 | 22 |
| 2022-01-03 | D | 9 | 10 | 2 | 18 |
For each date, and each grp_name, I want to calculate the AVG of value_a, value_b and value_c accross all rows, and (here's where I run into problems): the AVG of value_a, value_b and value_c for all rows that are NOT in the group.
Expected for grp_name = A on date = 2022-01-01. I imagine generating an in_grp column to separate the average values that are from the group at hand from those that are from non-group members.
| date | grp_name | in_grp | value_a | value_b | value_c |
|---|---|---|---|---|---|
| 2022-01-01 | A | TRUE | 2 | 11 | 4.5 |
| 2022-01-01 | A | FALSE | 7 | 2 | 20 |
Here is the simple query that I've written so far, that lacks the ability to pick up non-group members for the averages, and create the in_grp column to separate the group members from the non-group members:
SELECT
date,
grp_name,
AVG(value_a) value_a,
AVG(value_b) value_b,
AVG(value_c) value_c
FROM table
GROUP BY date, grp_name
Any advice on how to solve this?
Solution 1:[1]
Consider below approach
with temp as (
select distinct date, grp_name,
count(*) over() count_all,
count(*) over(partition by date, grp_name) count_in_grp,
sum(value_a) over() sum_a,
sum(value_a) over(partition by date, grp_name) sum_a_in_grp,
sum(value_b) over() sum_b,
sum(value_b) over(partition by date, grp_name) sum_b_in_grp,
sum(value_c) over() sum_c,
sum(value_c) over(partition by date, grp_name) sum_c_in_grp,
from your_table
)
select date, grp_name, true as in_grp,
sum_a_in_grp / count_in_grp as value_a,
sum_b_in_grp / count_in_grp as value_b,
sum_c_in_grp / count_in_grp as value_c
from temp
union all
select date, grp_name, false as in_grp,
(sum_a - sum_a_in_grp) / (count_all - count_in_grp) as value_a,
(sum_b - sum_b_in_grp) / (count_all - count_in_grp) as value_b,
(sum_c - sum_c_in_grp) / (count_all - count_in_grp) as value_c
from temp
-- order by date, grp_name, in_grp desc
if applied to sample data in your question output is
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Mikhail Berlyant |

