'How to know which among my tables falls under long term storage in bigQuery?
I currently have multiple datasets within a project and I would like to know if there's a way to know at one go what are the tables that are not updated in the last 90 days which means it falls under the long term storage bucket in bigquery. Is there some sql syntax for this?
Thank you
Solution 1:[1]
Not SQL but a possible solution is using the bqtool. Describing a table will return the number of Long Term Storage bytes (numLongTermBytes):
$ bq show --format=prettyjson dataset.table
...
"kind": "bigquery#table",
"lastModifiedTime": "1534845362446",
"location": "US",
"numBytes": "56",
"numLongTermBytes": "56",
"numRows": "3",
...
Therefore, you can extend it to list all tables in each dataset and get the desired information from each. Based from this I did a quick example:
#!/bin/bash
project=PROJECT_ID
dataset=DATASET_NAME
max_results=100
# get list of tables
tables=$(bq ls --max_results $max_results "$project:$dataset" | awk '{print $1}' | tail -n +3)
# get LTS bytes for each table
for table in $tables
do
printf '%-35s %-50s\n' "$table" "$(bq show --format prettyjson $project:$dataset.$table | grep numLongTermBytes)"
done
and you'll get an output similar to:
Dfp "numLongTermBytes": "0",
SO_55506947 "numLongTermBytes": "144",
SO_55506947_bis "numLongTermBytes": "144",
a "numLongTermBytes": "7",
a1 "numLongTermBytes": "399",
aaa "numLongTermBytes": "8",
adaptive "numLongTermBytes": "1085",
adaptive_view "numLongTermBytes": "0",
audience_segment_map_exp_test "numLongTermBytes": "300",
b "numLongTermBytes": "7",
...
Solution 2:[2]
I had the same question, but needed it across all the datasets in a given project. I took @Guillem Xercavins' answer and wrapped it in another for loop and made the project a command line input.
#!/bin/bash
project=$1
max_results=1000
# Get the datasets
datasets=$(bq ls --datasets=true --project_id=$project | awk '{print $1}' | tail -n +3)
for dataset in $datasets
do
# get list of tables
# The tail -n +3 strips off the header lines
tables=$(bq ls --max_results $max_results "$project:$dataset" | awk '{print $1}' | tail -n +3)
# get LTS bytes for each table
for table in $tables
do
printf '%-35s %-50s\n' "$table" "$(bq show --format prettyjson $project:$dataset.$table | grep numLongTermBytes)"
done
done
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Guillem Xercavins |
| Solution 2 | nomadic_squirrel |
