'How to get table/column usage statistics in Redshift

I want to find which tables/columns in Redshift remain unused in the database in order to do a clean-up.

I have been trying to parse the queries from the stl_query table, but it turns out this is a quite complex task for which I haven't found any library that I can use.

Anyone knows if this is somehow possible?

Thank you!



Solution 1:[1]

The column question is a tricky one. For table use information I'd look at stl_scan which records info about every table scan step performed by the system. Each of these is date-stamped so you will know when the table was "used". Just remember that system logging tables are pruned periodically and the data will go back for only a few days. So may need a process to view table use daily to get extended history.

I ponder the column question some more. One thought is that query ids will also be provided in stl_scan and this could help in identifying the columns used in the query text. For every query id that scans table_A search the query text for each column name of the table. Wouldn't be perfect but a start.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Bill Weiner