Search This Blog

Wednesday, July 12, 2023

A high level view of the most costly queries by CPU

A high level view of the most costly queries by CPU

This article is more about identifying High CPU Usage Queries in SQL Server at the Batch and Stored Procedure Levels

When CPU usage spikes on a SQL Server instance, you rarely have time to guess which queries are behind it. This post shows a quick, high‑level way to identify the top CPU consumers so you can decide where tuning effort will pay off first.

However, focusing solely on individual statements might not always provide the complete picture. Instead, examining queries at a batch or stored procedure level offers a broader view, enabling more effective optimizations. This approach aggregates CPU usage across all statements within a batch or stored procedure, facilitating a targeted investigation of high-impact areas.

The goal here is to aggregate CPU usage by all statements within the same batch or stored procedure, identified by their shared 'plan_handle'. This high-level view helps in pinpointing which queries are most costly in terms of CPU resources. Later on, we can drill down to individual statements with the highest CPU consumption for detailed analysis.


Important Note: CPU time is measured in microseconds, where 1,000,000 microseconds equal one second. For demonstration purposes, this guide focuses on the top 5 queries based on CPU usage, with at least one execution and cached execution plans. Adjustments may be necessary depending on the activity level and type in your specific SQL instance.


Why do multiple statements share the same plan_handle?

When you submit a batch (for example, a stored procedure or a script with several statements), SQL Server usually compiles the whole batch into one execution plan and assigns it a single plan_handle. Inside that plan, each individual statement has its own statistics (CPU, I/O, execution count), but they all still belong to the same plan. That’s why you often see many rows in sys.dm_exec_query_stats with the same plan_handle but different statement offsets – they are simply different statements from the same compiled batch or stored procedure.



The following query aggregates CPU times for batches or stored procedures, ranking them to identify the top consumers.



IF OBJECT_ID('tempdb..#top_costly_queries') IS NOT NULL
DROP TABLE #top_costly_queries;
GO

;WITH cte AS 
(
    SELECT TOP 5 
            SUM(qs.total_worker_time) AS total_cpu_time,
            SUM(qs.execution_count) AS total_execution_count,
            SUM(qs.total_worker_time) / SUM(qs.execution_count) AS avg_cpu_time,
            SUM(qs.total_elapsed_time) / SUM(qs.execution_count) AS avg_elapsed_time,
            COUNT(*) AS number_of_statements,
            qs.plan_handle
    FROM   sys.dm_exec_query_stats AS qs
    GROUP  BY qs.plan_handle
    HAVING SUM(qs.execution_count) > 1
)
SELECT 
    DB_NAME(qp.dbid) AS db_name,
    COALESCE(QUOTENAME(DB_NAME(qp.dbid)) + N'.' + QUOTENAME(OBJECT_SCHEMA_NAME(qp.objectid, qp.dbid)) + 
             N'.' + QUOTENAME(OBJECT_NAME(qp.objectid, qp.dbid)), '') AS obj_name,
    qp.objectid,
    qp.query_plan,
    cte.*
INTO   #top_costly_queries
FROM   cte
       CROSS APPLY sys.dm_exec_query_plan(cte.plan_handle) AS qp
ORDER  BY total_cpu_time DESC;






Analyzing the Results

After identifying the batches or stored procedures with the highest CPU usage, we focus on the specific statements within them:


SELECT t.*,
       SUBSTRING(st.text, ( qs.statement_start_offset / 2 ) + 1, (
            (CASE qs.statement_end_offset 
                WHEN -1 THEN DATALENGTH(st.text)
                ELSE qs.statement_end_offset 
            END - qs.statement_start_offset ) / 2 ) + 1) AS statement_text,
       qs.total_worker_time / qs.execution_count AS avg_cpu_time_stmt,
       qs.*,
       st.*
FROM   sys.dm_exec_query_stats AS qs
       INNER JOIN #top_costly_queries AS t ON t.plan_handle = qs.plan_handle
       CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
ORDER  BY qs.total_worker_time DESC;


Partial Results:












This DMV query returns the top CPU‑consuming cached plans so you can quickly see which workloads are putting the most pressure on the CPU and warrant further investigation due to their high CPU usage.

Analyzing the results, we might find outliers indicating inefficient queries. The next step involves reviewing the execution plan of these queries, accessible via the 'query_plan' link in the query results, to identify optimization opportunities.



Conclusion


Focusing on CPU usage at the batch or stored procedure level gives you a practical, high‑level starting point for performance tuning. By aggregating CPU across all statements that share a plan, you can quickly identify which workloads are putting the most pressure on the CPU and then drill into the specific statements that drive that cost.  

This DMV‑based approach is best used alongside other tools and metrics, such as waits, I/O statistics, and Query Store, to build a more complete picture of query behavior over time. As always, be mindful when running diagnostic queries in production, especially on busy systems, and schedule deeper analysis during low‑traffic windows whenever possible.  

Regularly reviewing and optimizing these high‑impact queries will help you use CPU resources more efficiently, improve application responsiveness, and delay or avoid unnecessary hardware and licensing upgrades.  



Additional note: CPU and licensing costs

In many environments, SQL Server is licensed per core, so your licensing cost is directly tied to how much CPU capacity you need. When a few inefficient, CPU‑intensive queries keep the processors constantly busy, the “easy” fix often looks like adding more cores, which directly increases your licensing bill.  

In my organization, the total number of licensed cores has steadily increased over time, and this trend has started to attract more attention from management. That experience is a big part of why I focus on finding and tuning CPU‑heavy queries: every optimization that reduces CPU pressure helps us use existing cores more efficiently and can delay, or sometimes avoid, the need to pay for additional licenses.