'Tuning slow query to SYS tables
I'm monitoring the slowest queries to my databases and this one can't be indexed:
USE [Project]
SELECT TOP 10
t.NAME AS TableName,
t.object_id,
mainIndex.data_space_id,
p.rows AS RowCounts,
SUM(a.total_pages) * 8 AS TotalSpaceKB,
SUM(CASE i.[type]
WHEN 2 THEN 8 * a.used_pages
ELSE 0
END) AS IndexSpaceKB
FROM [sys].[tables] t
INNER JOIN sys.indexes mainIndex ON (t.object_id = mainIndex.object_id AND mainIndex.[type] IN (0,1))
INNER JOIN sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN [sys].[partitions] p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN [sys].[allocation_units] a ON p.partition_id = a.container_id
WHERE t.is_ms_shipped = 0 AND i.OBJECT_ID > 255 GROUP BY t.Name, t.object_id, mainIndex.data_space_id, p.Rows
ORDER BY TotalSpaceKB DESC
Could someone please advise as to what can be done so I don't have to lose 20+ seconds running this query against a database with thousands of tables?
Solution 1:[1]
You are joining lots of complex views together before aggregating, ordering and getting the top 10 rows. We can see in the actual execution plan that these joins are taking about 2 seconds each because of the amount of data. I would expect pushing this aggregation so it can happen sooner so you can apply the top 10 sooner and then joining to the other views would be faster. It's also worth seeing if you can skip joins by deriving data elsewhere.
Something like this:
select t.name table_name
,sq.object_id
,mainIndex.data_space_id
,sq.max_rows RowCounts
,sq.TotalSpaceKB
,sq.IndexSpaceKB
from (select top 10
p.object_id, min(p.index_id) main_index_id, max(p.rows) max_rows, 8*sum(a.total_pages) TotalSpaceKB, 8*sum(case when p.index_id > 1 THEN a.used_pages else 0 end ) IndexSpaceKB
from sys.partitions p
inner hash join (select container_id, sum( total_pages) total_pages, sum( used_pages) used_pages from [sys].[allocation_units] a group by container_id) a
on p.partition_id = a.container_id
where p.object_id > 255
group by p.object_id
order by TotalSpaceKB desc
) sq
join sys.tables t
on sq.object_id = t.object_id
join sys.indexes mainIndex
on sq.main_index_id = mainIndex.index_id
and sq.object_id = mainIndex.object_id
where t.is_ms_shipped = 0
Here I've found the top 10 table object_ids by size and then I join to figure out what the tables were. I'm using index_id rather than type (which exists in sys.partitions to determine whether it is the main table partition or a non-clustered index.
Now, the caveat is that if the top 10 tables by size include a table filtered out by the sys.tables join then you could miss data. If this happens then it will be obvious by having less than 10 results. In that scenario, your user tables are probably so small that you might not be that interested in which are the biggest.
I don't have any massive systems to test this on. On my smaller system, I needed to hint the hash join to my pushed aggregation of sys.allocation_units. You might not find this.
It's possible that using the tables directly rather than the views on top of them may be faster still if you really need to squeeze every bit of performance out of it.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Andrew Sayer |
