'Make calculations across multiple tables based on the table suffix in Bigquery
I have a database of daily tables (with prefixes formatted as yyyymmdd) with customers info, and I need to get a 90 day timeline of 90 day ARPUs (average revenue per user). i.e. one point on the graph = revenue / number of users for the previous 90 days.
I have tried the following:
    SELECT
    ninetyd_rev/ninetyd_userid as arpu
    From(
    SELECT
    SUM(revenue) OVER (ORDER BY activity_date ROWS BETWEEN 89 PRECEDING AND CURRENT ROW ) AS ninetyd_rev,
    COUNT(userid) OVER (ORDER BY activity_date ROWS BETWEEN 89 PRECEDING AND CURRENT ROW ) AS ninetyd_userid,
    FROM
      `table_*`
    WHERE
      _TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", DATE_ADD(CURRENT_DATE(), INTERVAL -180 DAY))
      AND FORMAT_DATE("%Y%m%d",CURRENT_DATE()))
(where activity_date corresponds to the table suffix)
but that results in the following error: Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 132% of limit. Top memory consumer(s): sort operations used for analytic OVER() clauses: 98% other/unattributed: 2%
I was also thinking that there might be a way to generate and array of numbers that can be looped in the interval somehow. Something like:
  generate array 0 to 89 as **variable x**
SELECT
    SUM(revenue)/COUNT(userid) AS arpu
    FROM
      `table_*`
    WHERE
      _TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", DATE_ADD(CURRENT_DATE(), INTERVAL **-x** DAY))
      AND FORMAT_DATE("%Y%m%d",CURRENT_DATE())
Yet I have not been able to achieve anything useful and not sure if that approach is possible at all. Thank you for all the tips!
Edit. The expected output would be
| day | arpu | 
|---|---|
| d1 | arpu1 (d1 and previous 89 days rev/#users) | 
| d2 | arpu2 (d2 and previous 89 days rev/#users) | 
| dn | arpun (dn and previous 89 days rev/#users) | 
ideally the days (d1, d2, ...dn) would be in yyyymmdd format.code
UPD. Based on the suggestion from @dikesh, I ran the query below. Yet Bigquery never finishes the calculation and does not give an output
WITH 
daily_userids AS (
SELECT activity_date as activity_date, userid as users 
FROM `table_x`
WHERE _table_suffix BETWEEN FORMAT_DATE("%Y%m%d", DATE_ADD(CURRENT_DATE(), INTERVAL -30 DAY)) AND FORMAT_DATE("%Y%m%d",CURRENT_DATE()) 
GROUP BY 1,2),
        
daily_revenue AS (
SELECT activity_date, SUM(revenue) AS revenue 
FROM `table_x`
WHERE _table_suffix BETWEEN FORMAT_DATE("%Y%m%d", DATE_ADD(CURRENT_DATE(), INTERVAL -30 DAY)) AND FORMAT_DATE("%Y%m%d",CURRENT_DATE()) 
GROUP BY 1 ),
        
-- Find number of users in last 30 days from the date
table_users AS (
SELECT t1.activity_date, count (DISTINCT t2.users) AS users
FROM daily_userids AS t1 JOIN daily_userids AS t2
ON t1.activity_date BETWEEN t2.activity_date AND DATE_ADD(t2.activity_date, INTERVAL 29 DAY)
GROUP BY t1.activity_date),
    
-- Find total revenue in last 30 days from the date
table_revenue AS (
SELECT activity_date, SUM(revenue) OVER (ORDER BY activity_date DESC ROWS BETWEEN CURRENT ROW AND 29 FOLLOWING) AS revenue
FROM daily_revenue
)
-- FIND ARPU
SELECT activity_date, users, revenue, ROUND(revenue / users, 4) AS arpu
FROM table_users JOIN table_revenue USING (activity_date)
ORDER BY activity_date DESC
							
						Solution 1:[1]
Here is the sample query to find ARPU of last two days from each date
WITH table1 AS (
  SELECT * FROM UNNEST([
    STRUCT(DATE '2022-05-09' AS day, 100 AS revenue, 'A' AS userid),
    STRUCT(DATE '2022-05-09' AS day, 110 AS revenue, 'B' AS userid),
    STRUCT(DATE '2022-05-09' AS day, 80 AS revenue, 'C' AS userid),
    STRUCT(DATE '2022-05-08' AS day, 100 AS revenue, 'B' AS userid),
    STRUCT(DATE '2022-05-08' AS day, 110 AS revenue, 'C' AS userid),
    STRUCT(DATE '2022-05-08' AS day, 80 AS revenue, 'D' AS userid),
    STRUCT(DATE '2022-05-07' AS day, 100 AS revenue, 'C' AS userid),
    STRUCT(DATE '2022-05-07' AS day, 110 AS revenue, 'D' AS userid),
    STRUCT(DATE '2022-05-07' AS day, 80 AS revenue, 'E' AS userid),
    STRUCT(DATE '2022-05-06' AS day, 100 AS revenue, 'E' AS userid),
    STRUCT(DATE '2022-05-06' AS day, 110 AS revenue, 'C' AS userid),
    STRUCT(DATE '2022-05-06' AS day, 80 AS revenue, 'A' AS userid),
    STRUCT(DATE '2022-05-05' AS day, 100 AS revenue, 'B' AS userid),
    STRUCT(DATE '2022-05-05' AS day, 110 AS revenue, 'D' AS userid),
    STRUCT(DATE '2022-05-05' AS day, 80 AS revenue, 'C' AS userid)
  ])
),
daily_userids AS (SELECT DISTINCT day, userid from table1),
daily_revenue AS (SELECT day, SUM(revenue) AS revenue FROM table1 GROUP BY day),
-- Find number of users in last 2 days from the date
table_users AS (
  SELECT t1.day, COUNT(DISTINCT t2.userid) AS users
  FROM daily_userids AS t1 JOIN daily_userids AS t2
  ON t1.day BETWEEN t2.day AND DATE_ADD(t2.day, INTERVAL 1 DAY)
  GROUP BY t1.day
),
-- Find total revenue in last 2 days from the date
table_revenue AS (
  SELECT day, SUM(revenue) OVER (ORDER BY day DESC ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) AS revenue
  FROM daily_revenue
)
-- FIND ARPU
SELECT day, users, revenue, ROUND(revenue / users, 2) AS arpu
FROM table_users JOIN table_revenue USING (day)
ORDER BY day DESC
Output:
| day | users | revenue | arpu | 
|---|---|---|---|
| 2022-05-09 | 4 | 580 | 145.0 | 
| 2022-05-08 | 4 | 580 | 145.0 | 
| 2022-05-07 | 4 | 580 | 145.0 | 
| 2022-05-06 | 5 | 580 | 116.0 | 
| 2022-05-05 | 3 | 290 | 96.67 | 
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source | 
|---|---|
| Solution 1 | 
