'Compare AVG of rows in a group with all rows that are NOT in the group (BigQuery)

I have a data set that looks like this:

date grp_name uid value_a value_b value_c
2022-01-01 A 1 1 10 5
2022-01-01 B 2 7 1 20
2022-01-01 C 10 7 3 20
2022-01-01 A 3 3 12 4
2022-01-02 B 2 6 1 21
2022-01-02 B 5 3 4 19
2022-01-03 A 6 1 15 6
2022-01-03 C 7 8 2 22
2022-01-03 D 9 10 2 18

For each date, and each grp_name, I want to calculate the AVG of value_a, value_b and value_c accross all rows, and (here's where I run into problems): the AVG of value_a, value_b and value_c for all rows that are NOT in the group.

Expected for grp_name = A on date = 2022-01-01. I imagine generating an in_grp column to separate the average values that are from the group at hand from those that are from non-group members.

date grp_name in_grp value_a value_b value_c
2022-01-01 A TRUE 2 11 4.5
2022-01-01 A FALSE 7 2 20

Here is the simple query that I've written so far, that lacks the ability to pick up non-group members for the averages, and create the in_grp column to separate the group members from the non-group members:

SELECT
  date,
  grp_name,
  AVG(value_a) value_a,
  AVG(value_b) value_b,
  AVG(value_c) value_c
FROM table
GROUP BY date, grp_name

Any advice on how to solve this?



Solution 1:[1]

Consider below approach

with temp as (
  select distinct date, grp_name,
    count(*) over() count_all,
    count(*) over(partition by date, grp_name) count_in_grp,
    sum(value_a) over() sum_a,
    sum(value_a) over(partition by date, grp_name) sum_a_in_grp,
    sum(value_b) over() sum_b,
    sum(value_b) over(partition by date, grp_name) sum_b_in_grp,
    sum(value_c) over() sum_c,
    sum(value_c) over(partition by date, grp_name) sum_c_in_grp,
  from your_table
)
select date, grp_name, true as in_grp, 
  sum_a_in_grp / count_in_grp as value_a,
  sum_b_in_grp / count_in_grp as value_b,
  sum_c_in_grp / count_in_grp as value_c
from temp 
union all
select date, grp_name, false as in_grp, 
  (sum_a - sum_a_in_grp) / (count_all - count_in_grp) as value_a,
  (sum_b - sum_b_in_grp) / (count_all - count_in_grp) as value_b,
  (sum_c - sum_c_in_grp) / (count_all - count_in_grp) as value_c
from temp
-- order by date, grp_name, in_grp desc              

if applied to sample data in your question output is

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mikhail Berlyant