'How do I get the last update time of a sequence of tables in BigQuery?

A BigQuery best practice is to split timeseries in daily tables (as "NAME_yyyyMMdd") and then use Table Wildcards to query one or more of these tables.

Sometimes it is useful to get the last update time on a certain set of data (i.e. to check correctness of the ingestion procedure). How do I get the last update time over a set of tables organized like that?



Solution 1:[1]

SELECT * FROM project_name.data_set_name.INFORMATION_SCHEMA.PARTITIONS where table_name='my_table';

Solution for Google

Solution 2:[2]

A good way to achieve that is to use the __TABLES__ meta-table. Here is a generic query I use in several projects:

SELECT
  MAX(last_modified_time) LAST_MODIFIED_TIME,
  IF(REGEXP_MATCH(RIGHT(table_id,8),"[0-9]{8}"),LEFT(table_id,LENGTH(table_id) - 8),table_id) AS TABLE_ID
FROM
  [my_dataset.__TABLES__]
GROUP BY
  TABLE_ID

It will return the last update time of every table in my_dataset. For tables organized with a daily-split structure, it will return a single value (the update time of the latest table), with the initial part of their name as TABLE_ID.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Stas_Shch
Solution 2