'BIGQUERY / my queries are very slow and expensive

I have a database in Firestore, send the information to Bigquery with a extension 'Stream Collections to BigQuery'.

But I have the tables with all the data in the same column called 'data'. Also, the data shows all the records/logs, and I just want to see the latest status.

To be able to see the data separated by columns and see the last value, create this query and them save as view.

SELECT
  document_name,
  document_id,
  timestamp,
  event_id,
  operation,
  created_at, 
  status_old, 
  status_new
 

FROM
  (
    SELECT
      document_name,
      document_id,
      FIRST_VALUE(timestamp) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS timestamp,
      FIRST_VALUE(event_id) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS event_id,
      FIRST_VALUE(operation) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS operation,
   FIRST_VALUE(`proyect_name.DatasetID.firestoreTimestamp`(JSON_EXTRACT(data, '$.timestamp'))) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS created_at,
         FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.metadata.old')) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS status_old,
              FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.metadata.new')) OVER(
        PARTITION BY document_name
        ORDER BY
          timestamp DESC
      ) AS status_new

    FROM
      table1
    ORDER BY
      document_name,
      timestamp DESC
  )
GROUP BY
  document_name,
  document_id,
  timestamp,
  event_id,
  operation,
  created_at, 
  status_old, 
  status_new

(I have several fields to add in addition to these, but it is just an example).

For all my querys, I use to use this 'view'.

The problem I have is that this query /view, takes a long time and cost:

Bytes processed: 555.78 MB
Bytes billed: 556 MB
Elapsed time: 43 sec
Slot time consumed: 4 min 29 sec
Bytes shuffled: 836.62 MB

Is there a way to make this more optimal?

Solution 1:^[1]

Hope these are helpful:

Remove ORDER BY in your subquery first. It will slow down your query and doesn't make any difference to the result.
It seems all the FIRST_VALUE come from same row. If so, you can simplify your query like this.

SELECT document_name,
       document_id,
       first_values.*
FROM (
  SELECT document_name,
         document_id,
         FIRST_VALUE(STRUCT(
           timestamp,
           event_id,
           operation,
           `proyect_name.DatasetID.firestoreTimestamp`(JSON_EXTRACT(data, '$.timestamp')) AS created_at,
           JSON_EXTRACT_SCALAR(data, '$.metadata.old') AS status_old,
           JSON_EXTRACT_SCALAR(data, '$.metadata.new') AS status_new
          ) OVER (PARTITION BY document_name ORDER BY timestamp DESC) AS first_values
     FROM table1
)

Lastly, you seems to try to remove duplicates with GROUP BY in your outer(main) query. But I don't think your inner query make duplicates. So, check if your subquery makes duplicates first and if not, remove GROUP BY from the outer query.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'BIGQUERY / my queries are very slow and expensive

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]