'BIGQUERY / my queries are very slow and expensive
I have a database in Firestore, send the information to Bigquery with a extension 'Stream Collections to BigQuery'.
But I have the tables with all the data in the same column called 'data'. Also, the data shows all the records/logs, and I just want to see the latest status.
To be able to see the data separated by columns and see the last value, create this query and them save as view.
SELECT
document_name,
document_id,
timestamp,
event_id,
operation,
created_at,
status_old,
status_new
FROM
(
SELECT
document_name,
document_id,
FIRST_VALUE(timestamp) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS timestamp,
FIRST_VALUE(event_id) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS event_id,
FIRST_VALUE(operation) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS operation,
FIRST_VALUE(`proyect_name.DatasetID.firestoreTimestamp`(JSON_EXTRACT(data, '$.timestamp'))) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS created_at,
FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.metadata.old')) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS status_old,
FIRST_VALUE(JSON_EXTRACT_SCALAR(data, '$.metadata.new')) OVER(
PARTITION BY document_name
ORDER BY
timestamp DESC
) AS status_new
FROM
table1
ORDER BY
document_name,
timestamp DESC
)
GROUP BY
document_name,
document_id,
timestamp,
event_id,
operation,
created_at,
status_old,
status_new
(I have several fields to add in addition to these, but it is just an example).
For all my querys, I use to use this 'view'.
The problem I have is that this query /view, takes a long time and cost:
- Bytes processed: 555.78 MB
- Bytes billed: 556 MB
- Elapsed time: 43 sec
- Slot time consumed: 4 min 29 sec
- Bytes shuffled: 836.62 MB
Is there a way to make this more optimal?
Solution 1:[1]
Hope these are helpful:
Remove
ORDER BYin your subquery first. It will slow down your query and doesn't make any difference to the result.It seems all the
FIRST_VALUEcome from same row. If so, you can simplify your query like this.
SELECT document_name,
document_id,
first_values.*
FROM (
SELECT document_name,
document_id,
FIRST_VALUE(STRUCT(
timestamp,
event_id,
operation,
`proyect_name.DatasetID.firestoreTimestamp`(JSON_EXTRACT(data, '$.timestamp')) AS created_at,
JSON_EXTRACT_SCALAR(data, '$.metadata.old') AS status_old,
JSON_EXTRACT_SCALAR(data, '$.metadata.new') AS status_new
) OVER (PARTITION BY document_name ORDER BY timestamp DESC) AS first_values
FROM table1
)
- Lastly, you seems to try to remove duplicates with
GROUP BYin your outer(main) query. But I don't think your inner query make duplicates. So, check if your subquery makes duplicates first and if not, removeGROUP BYfrom the outer query.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
