'Bigquery query to get sum of values of one column based on another column

I want to write a bigquery query to get values of a column which is the sum of values of another column, based on a "like" condition. In the below table column starts_with_count is what I want to fill. I have added the expected values for this column manually to show my expectation. Other column values are already present. The starts_with_count value is sum (full_count) where its link appears in other rows.

company link full_count starts_with_count (expected)
abc http://www.abc.net1 1 15 (= sum (full_count) where link like 'http://www.abc.net1%')
abc http://www.abc.net1/page1 2 9 (= sum (full_count) where link like 'http://www.abc.net1/page1%')
abc http://www.abc.net1/page1/folder1 3 3 (= sum (full_count) where link like 'http://www.abc.net1/page1/folder1%')
abc http://www.abc.net1/page1/folder2 4 4
abc http://www.abc.net1/page2 5 5
xyz http://www.xyz.net1/ 6 21
xyz http://www.xyz.net1/page1/ 7 15
xyz http://www.xyz.net1/page1/file1 8 8


Solution 1:[1]

Try this:

WITH sample AS (
  SELECT * FROM UNNEST([
    STRUCT('abc' AS company, 'http://www.abc.net1' AS link, 1 AS full_count),
    ('abc', 'http://www.abc.net1/page1', 2),
    ('abc', 'http://www.abc.net1/page1/folder1', 3),
    ('abc', 'http://www.abc.net1/page1/folder2', 4),
    ('abc', 'http://www.abc.net1/page2', 5),
    ('xyz', 'http://www.xyz.net1/', 6),
    ('xyz', 'http://www.xyz.net1/page1/', 7),
    ('xyz', 'http://www.xyz.net1/page1/file1', 8)
  ])
)
SELECT first.company, first.link, SUM(second.full_count) AS starts_with_count
  FROM sample first, sample second 
 WHERE STARTS_WITH(second.link, first.link)
 GROUP BY 1, 2
;

output:

enter image description here

Solution 2:[2]

Another option

select * except(links),
  ( select sum(full_count)
    from t.links
    where starts_with(link, t.link)
  ) starts_with_count
from (
  select *, 
    array_agg(struct(link, full_count)) over(partition by company) links
  from your_table
) t     

if applied to sample data in your question - output is

enter image description here

For the simple/dummy example you provided - performance improvement is significant!

Which one to use really depends on your real data!
To analyze - use EXECUTION DETAILS tab

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2