Search This Blog

Monday, June 19, 2023

List Top Tables in SQL Server Database by Size

List Top Tables in SQL Server Database by Size

There are countless queries you can find on internet to get list of tables, size of database, tables, indexes etc...  Here is a one that I have used for my needs in my current employment.  I am  not only interested in size of tables but also size of indexes in each table. 

Note: This query relies on metadata/catalog views to get the table size, row count etc.. as SQL Server does a pretty good job  of keeping them up to date. However in rare cases if in doubt about the accuracy of the results, you can run the following command in the database to manually correct any inaccuracies. On relatively large database this can take a while to complete so keep that in mind. Generally though you do not need to run DBCC UDPATEUSAGE on regular basis.  

DBCC UPDATEUSAGE (0) WITH NO_INFOMSGS

Query to get top 100 tables by size - largest to smallest...

-- DBCC UPDATEUSAGE (0) WITH NO_INFOMSGS;
SELECT top 100

    s.Name AS schema_name,
    tbl.NAME AS table_name,
    SUM(SUM(au.total_pages) * 16 /1024) over(partition by ind.object_id) total_table_size_mb, 
    SUM(SUM(au.used_pages) *  16 /1024) over(partition by ind.object_id) total_used_space_mb, 

    ind.name index_name,
    ind.type_desc index_type,
    INDEXPROPERTY(ind.object_id, ind.name, 'IsUnique') is_unique,
    (select count(*) from sys.index_columns ic where ic.object_id = ind.object_id and ic.index_id = ind.index_id and ic.is_included_column = 0) indexed_columns,
    (select count(*) from sys.index_columns ic where ic.object_id = ind.object_id and ic.index_id = ind.index_id and ic.is_included_column = 1) included_columns,

    SUM(p.rows) AS ind_row_count,
    SUM(au.total_pages) * 16 /1024 AS ind_size_mb, 
    SUM(au.used_pages) *  16 /1024 AS ind_used_size_mb, 
    SUM(au.used_pages) *  16 / SUM(p.rows) ind_avg_row_size_kb,
    (SUM(au.total_pages) - SUM(au.used_pages)) * 16 /1024 AS vunused_space_mb,
    INDEXPROPERTY(ind.object_id, ind.name, 'IndexDepth') index_depth,
    ind.index_id

FROM 
    sys.tables tbl
    INNER JOIN sys.indexes ind ON tbl.OBJECT_ID = ind.OBJECT_ID
    INNER JOIN sys.partitions p ON ind.OBJECT_ID = p.OBJECT_ID AND ind.index_id = p.index_id
    INNER JOIN sys.allocation_units au ON p.PARTITION_ID = au.container_id
    INNER JOIN sys.schemas s ON tbl.SCHEMA_ID = s.SCHEMA_ID
-- WHERE tbl.NAME = '<Schema.Table_Name>'
GROUP BY  s.Name, ind.object_id, ind.index_id, tbl.Name, ind.name, ind.type_desc
HAVING SUM(p.rows) > 0
ORDER BY  total_used_space_mb desc 







What Do You Look for in an Execution Plan?

Say, you have a query having performance issues and you decide to look it's graphical execution plan in SSMS (or other tools). If you are lucky, right off the bat SQL Server reveals a quick and easy fix:




Its missing a crucial index! You add the missing index and the problem solved, you are the hero of the hour!!!!

Sooner than later, you will come across queries that don't have such obvious fixes aka low hanging fruits.  And, you will realize, analyzing execution plans is very tedious, time consuming and its probably impossible to read every single property of every single node of every single query.  You decide to be smarter about it so you apply the pareto principle 1) focus your energies on 20% queries causing 80% of performance issues (or maybe there is no performance issue to solve per se but you just want to make things run better) and 2) Within execution plans for those 20% of queries, you devote your attention to the highest impact or informative operators of the execution plans.  For example:


Plan metadata:

Start by first reviewing the plan metadata. It is the very first operator listed on the far left of the plan, it contains the metadata about the plan and as you can probably see in this screen shot, the information displayed is at least, informative:



At the bottom you see the “Warnings” that are associated with this plan, all in one place.


Warnings:


Warnings are indicators of a possible issue with the query that could affect the performance. This doesn't necessarily mean that its a real problem. It's just an indicator of a possible problem and should be investigated.





The tooltip will show the reason for the warning:




In that particular case if I am not using this column in a where clause, then this is most likely a false warning as it cannot then affect the row count estimates. Whereas in the following example both queries show warning in the execution plan due to the use of the CAST function. However its the second query where it's negatively affecting the performance because the war due to the use of CAST function in the WHERE CONDITION, making it essentially unsargable.  First you will notice that the second query is switched to doing scan vs seek operator being used by the first query. 


And secondly,  the second query requires significantly (relatively speaking) more I/O to get the same results. 

(1 row affected)
Table 'TBL_______________'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 67, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)

(1 row affected)
Table 'TBL_______________'. Scan count 1, logical reads 1367, physical reads 0, read-ahead reads 0, lob logical reads 69, lob physical reads 0, lob read-ahead reads 0.

(1 row affected)


Most Costly Operations:


Even though the costs listed are estimates only, optimizer does rely on those estimates and therefore we must too pay more attention to them.




Fat Pipes:

Pipes, are the arrows that connect one operator to the next and represents data movement. Width of the pipes indicate amount of data being processed so a thick pipe would indicate lots of data and thin would mean less data. Another thing you would want to pay attention to is the change in pipes width from thin to fat or from fat to thin or, a fat pipe getting even fatter.







Scans: 

Relatively speaking, seeks are better than scans. It’s not that Seeks are always good or that scans are always bad but scans do indicate I/O which is often an issue for query performance.  Some times optimizer will choose scan even if a seek option is available and often it's right. In a simplistic example,  if my query SELECT * FROM MYTABLE; that contains no WHERE clause or any kind of filtering condition at all, in that case the optimizer can only choose a scan,  In fact for such queries a seek will run slower than a scan.

In short, look at the scan operators more closely than say seek operators. 




Huge discrepancies between estimated and actual counts

For example, in the below screenshots the estimated rows is only 8 but actual plan returned 5.1 million rows, that's a huge discrepancy.





This could be result of 1) Outdated Statistics ( Do you have outdated statistics?
2) Parameter sniffing 3) a bad query plan getting cached (use WITH RECOMPILE to get rid of it) or sometimes combination of more than one factors.

Another discrepancy to pay attention is to look if there is a very high “Estimated Number Of Rows to be Read” compared to very low “Estimated Rows Per Execution”.  This maybe indication that your WHERE clause is unsargable.   A very simple example it would be to use aggregate functions (sum, count etc.) against all rows. 






Final thoughts:


Often this approach works to quickly identify query performance problems. Even if it doesn't eventually, its a good starting strategy. And I would admit this has its limits and won't always be enough or "good enough"  and you do then have to put in real hard work in understanding and optimizing your query. 

At that point its good idea to start paying attention to the the NodeIDs of each operation. NodeIDs are integer values starting with 0...n, from left-to-right.  It lets you know order in which your query is being processed by the SQL Server...   Some operators even refer back to other operators. 

Next, keep track of output of each operator, some operators add or change columns to the query results so knowing where each column came from can be very helpful. 

And finally, to get a complete picture, get into habit of reviewing properties of each operator and not just look at the information in the tooltip.