Search This Blog

Tuesday, June 24, 2025

Measuring and Improving SQL Server Query Plan Cache Efficiency

Measuring and Improving SQL Server Query Plan Cache Efficiency

The query plan cache hit ratio in SQL Server indicates the percentage of queries that are executed using an already generated and available execution plan stored in the SQL Server memory cache. If no suitable plan exists for a query submitted to SQL Server, the query optimizer must create a new plan, which consumes valuable resources and time—a situation known as a "cache miss".  This article though does not cover how SQL Server decides whether a suitable matching plan already exists for reuse. 

Some level of cache misses are expected, especially right after the SQL Server instance starts. However, as the system continues running and the cache fills, cache misses should become rare. A high plan cache hit ratio, typically 90% or above, is desirable and indicates efficient plan reuse. On a highly OLTP system I personally prefer to see that number even higher like 98% or even close to 100%, as a higher ratio means better performance due to reduced CPU and memory usage from plan recompilation.


Conversely, a low hit ratio, such as 75% or worse, may indicate that queries are not being optimized for reuse or that the plan cache size or overall system memory is not enough. You may need to review query patterns or consider increasing the plan cache size by adding more memory to the server (because there is no direct configuration option in SQL Server to explicitly set or increase the plan cache size).


The first step in tackling plan cache efficiency is to measure the plan cache hit ratio:

/*
This query calculates the Plan Cache Hit Ratio as a percentage 
in SQL Server. The Plan Cache Hit Ratio measures how often SQL 
Server is able to reuse execution  plans from the plan cache, 
rather than having to compile new ones. 

A higher ratio indicates better performance, as reusing plans saves 
CPU and memory resources.

How: 
1. The query pulls two values from the dynamic management 
   view sys.dm_os_performance_counters:
- Cache hit ratio: The number of times a cached plan was found and reused.
- Cache hit ratio base: The total number of cache lookups

2. It divides the hit count by the total lookups and multiplies by 
   100 to get a percentage.

3. The RTRIM(object_name) is used to ensure trailing spaces 
   in the object_name column do not affect the filter.
 Output:
    - Returns a single value: the plan cache hit ratio as a percentage.
    - A high value (close to 100%) means most query plans are being reused.
    - A low value may indicate frequent query recompilations, 
      which can impact performance.
*/

SELECT
    100 * 
    (
        -- Numerator: Get the current value of the 'Cache hit ratio' counter
        SELECT cntr_value
        FROM sys.dm_os_performance_counters
        WHERE 
            RTRIM(object_name) LIKE '%:Plan Cache'  -- Filter for the Plan Cache object
            AND counter_name = 'Cache hit ratio'    -- Select the 'Cache hit ratio' counter
            AND instance_name = '_Total'            
    ) 
    /
    (
        -- Denominator: Get the current value of the 'Cache hit ratio base' counter
        SELECT cntr_value
        FROM sys.dm_os_performance_counters
        WHERE 
            RTRIM(object_name) LIKE '%:Plan Cache'      
            AND counter_name = 'Cache hit ratio base'   -- Select the base counter for the ratio
            AND instance_name = '_Total'                
    ) AS [Plan cache hit ratio %];











Tips For Improving Plan Cache Hit Ratio


While the following is by no means a complete or exhaustive list, here are some steps you can take to improve plan reusability and, in turn, increase the plan hit ratio.

When Writing Queries:

  • Use Stored Procedures as much as possible.  Their precompiled nature promotes plan reuse and reduces compilation overhead
  • Avoid writing Ad-Hoc queries and minimize dynamic SQL. If unavoidable, use sp_executesql with parameters instead of EXEC to enable parameterization. Require or at least suggest application developers to utilize query parameterization feature in application code
  • Standardize query formatting for consistent casing, spacing, and punctuation. Even trivial variations (e.g., extra spaces) generate new plan hashes.  SQL Server performs a binary comparison of the incoming query's text with cached queries. Any difference in whitespace, casing, comments, or literals will result in a different plan hash and prevent reuse
  • Consider using a plan guide or a query hint to force parameterization for a specific ad hoc query that generates too many cached execution plans due to variations in the literal values each time it is executed.


Minimize Plan Recompilations

Yes, plan recompilations do factor into cached plan ratio calculations and will lower the overall plan cache hit ratio.

Here are the primary triggers or reasons for plan recompilations:

Trigger Type

Example/Description

Schema change

ALTER TABLE, DROP INDEX, etc.

Statistics update

UPDATE STATISTICS, auto stats update

SET options change

Changing ANSI_NULLS, etc.

sp_recompile

Manual plan removal for object

Query hint

OPTION (RECOMPILE), WITH RECOMPILE

Cache clearing

DBCC FREEPROCCACHE

Query text change

Any modification to the SQL text

Dependent object change

Altering tables/views referenced by procedures   


Apart from those, when SQL Server experiences memory pressure—either because the plan cache itself grows too large (local memory pressure) or because the overall system is low on memory (global memory pressure)—it will proactively remove execution plans from the plan cache to free up memory.

For most systems, the most common reasons involve frequent (and justified) index rebuilds or reorganizations, as well as statistics updates (manual or automatic). This is because it is considered a performance best practice to regularly rebuild indexes and update statistics to keep them compact and current. To that end, we typically schedule a daily/nightly job for index and statistics maintenance.

When you modify, drop, or rebuild an index on a table, while SQL Server does not immediately remove related execution plans from the cache, it does mark them as invalid. The next time a query that references the changed table is executed, SQL Server detects the schema change, recompiles the query, and generates a new execution plan, which is then stored in the cache. So unless you have to, scheduled your index maintenance jobs during off peak hours.

In fact, Creating or dropping indexes can indirectly affect plan cache efficiency. If queries are frequently recompiled due to index changes or if many single-use plans are generated, this can lead to plan cache bloat, reducing cache efficiency and potentially degrading performance.

Same goes for the index statistics. SQL Server query optimizer, being 'cost-based', fundamentally relies on cardinality estimation, using per-attribute summary statistics (histograms) to generate efficient execution plans. When statistics are updated—especially if AUTO_UPDATE_STATISTICS is enabled—SQL Server typically invalidates existing cached plans that depend on those statistics. This triggers recompilation the next time the query runs, ensuring the new plan reflects the updated data distribution. If statistics are stale or missing, SQL Server may generate suboptimal plans, and these can remain in cache until statistics are refreshed.


Query Optimization

If you have a low plan cache hit ratio then you may also have poorly performing queries. There maybe things you can do to identify and improve those queries. 

You should also review the overall system configuration - including the database, instance, and server - for optimization. While application design typically falls outside the DBA's direct control (even though many DBAs have development experience), if you have thoroughly optimized everything within your scope and documented your efforts, it should not be difficult to make a strong case for an application redesign.


Resources:


sys.dm_exec_cached_plans

SQL Server Plan Cache Limits

Forced Parameterization

Can Forced Parameterization Go Wrong


Additional Queries:


-- Get top 10 queries by average elapsed time (in milliseconds)
-- Only include queries that have been executed at least 5 times
SELECT TOP 10
    SUBSTRING(qt.TEXT, (qs.statement_start_offset/2)+1,
        ((CASE qs.statement_end_offset
              WHEN -1 THEN DATALENGTH(qt.TEXT)
              ELSE qs.statement_end_offset
          END - qs.statement_start_offset)/2)+1) as query_text,
    qs.execution_count,
    qs.total_logical_reads,
    qs.total_logical_writes,
    qs.total_worker_time/1000 AS total_worker_time_ms,
    qs.total_elapsed_time/1000 AS total_elapsed_time_ms,
    (qs.total_elapsed_time / qs.execution_count) / 1000 avg_elapsed_time_ms,
    qs.last_execution_time ,
    qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
WHERE qs.execution_count > 4 
ORDER BY avg_elapsed_time_ms DESC;


-- Get size of the plan cache by plan type
;WITH cte_plan_cache_usage_by_obj_type AS 
(
	SELECT COALESCE(objtype, '-- ALL Plans') objtype,
           Sum(Cast(size_in_bytes AS BIGINT)) / 
		              1024 / 1024 size_mb
    FROM   sys.dm_exec_cached_plans
    GROUP  BY rollup ( objtype )
)
SELECT objtype,
       size_mb,
	   'percent' = 
		   ( size_mb * 100 ) / 
                       (SELECT size_mb
			FROM   cte_plan_cache_usage_by_obj_type
			WHERE  objtype = '-- ALL Plans'
		   ) 
FROM   cte_plan_cache_usage_by_obj_type
ORDER  BY size_mb DESC



Monday, January 13, 2025

15 Characters or Less: The Challenge of SQL Server Listeners Name

15 Characters or Less: The Challenge of SQL Server Listeners Name

There is a hard limit on how long your AlwaysOn listener name can be. That is expected; what may not be expected is that the limit is a mere 15 characters or less.

Yes, even though the maximum length allowed for an AlwaysOn listener DNS name is 63 characters, in practice you can only specify up to 15 characters. This limit comes from the NetBIOS computer name restriction, which is 15 characters, even in the latest versions of Windows 2025. Technically, the maximum length for a NetBIOS name is 16 characters, but the 16th character (NetBIOS Suffix) is reserved for identifying the network device functionality, effectively limiting it to 15 characters.

The limit is for the name part, not the entire Fully Qualified Domain Name (FQDN), which can be up to 255 characters long.

So if you are using SSMS to create an AlwaysOn Listener, it won't even let you type in more than 15 characters:






When I first encountered this issue a few years ago, that should have been my first clue. Did I pay attention and think for a second? Of course not! I spent close to two hours before zeroing in on what the main issue was.

If you bypass the GUI as I did back then and resort to TSQL (or PowerShell), you will receive error similar to this:

Msg 41066, Level 16, State 0, Line 3

Cannot bring the Windows Server Failover Clustering (WSFC) resource (ID '70ce3804-e3ed-4db6-ad42-6936daeeb5aa') online (Error code 5942). 
The WSFC service may not be running or may not be accessible in its current state, or the WSFC resource may not be in a state that could
accept the request.  For information about this error code, see "System Error Codes" in the Windows Development documentation.

Msg 19476, Level 16, State 4, Line 3

The attempt to create the network name and IP address for the listener failed.

The WSFC service may not be running or may be inaccessible in its current state, or
the values provided for the network name and IP address may be incorrect. Check the state of the WSFC cluster and
validate the network name and IP address with the network administrator.














The error is ugly and not intuitive! SQL Server and FCS logs are not very helpful either. 

As a last resort, you could bypass the SQL Server tools completely and create the listener through the Failover Cluster Manager tool, by adding a new resource to your Availability Group of type 'Client Access Point':








On the surface, this will work and complete successfully:









In reality, the newly added Client Access Point will not come online, and your AlwaysOn resource will show as partially failed











Is there a solution or workaround that works?

Sadly, no. Though justifiably annoying, in the Microsoft Windows world, there is no way to get around this limit. The catch-all reason given is of course - "For backward compatibility with legacy Windows Computers and Applications that may still need compatibility with  NetBIOS". Even if you do not have any legacy systems, this requirement is still enforced. The Windows Failover Clustering still requires that the cluster name (the virtual computer name that you can connect to) must strictly adhere to NetBIOS naming convention. And this requirement extends to related services, such as SQL listener names for Always On Availability Groups. Bottom line: If the Windows Failover Cluster (WFC) service cannot successfully register your chosen name in the network (per NetBIOS standards), the resource will be unable to come online, even if the initial resource creation appears successful.

A potential alternative could be to install your SQL Servers in the Linux world without Windows Failover Clustering and Active Directory. I am told you could have a longer name for the AlwaysOn Listener in this environment. However, I have not tested this myself and therefore would not count it as a definitive solution. Even if it works, isn't that too much effort just to name your listener with more than 15 characters?

I ended up making a slight change in my naming convention so that the listener name would not exceed 15 characters.


Resources:


Configure a listener for an Always On availability group

Friday, January 10, 2025

How to clear the SSMS cache?

How to clear the SSMS cache?

If you are reading this, it's probably because you too have this rare need to clear the SSMS cache for some kind of troubleshooting. 

Generally, when I am troubleshooting a connection issue that I wonder is due to a cached or stale connection, the first things that come to mind are clearing the DNS client cache and, occasionally, the ARP cache. Rarely, if ever, do I consider the SSMS cache.  Of course, there are other reasons why you may want to clear the cache.

Since almost everything in the computer world uses some kind of cache, it's no surprise that SQL Server Management Studio (SSMS) also has its own cache. This cache makes our lives easier by remembering things like most recently used (MRU) connections and objects, and enabling features like IntelliSense, auto-completion, syntax highlighting, object names, our preferences, and even idiosyncrasies. The cache is stored in what Microsoft refers to as a cache file located in your user profile directory. The cache files are not human-readable per se (it would be a security risk if anyone got hold of those cache files), nor do I currently have any desire to read them. And I'm not concerned right now with the size of the cache or the exact algorithm it uses to decide when to automatically refresh it. However, you can use the keyboard shortcut CTRL+SHIFT+R to manually refresh it.

And depending on the version of SQL Server Management Studio (SSMS), the cache folder also contains one or more settings and configuration files. For example, in the case of SSMS 20.x, one such file is NewSettings.vssettings (first introduced in SSMS 17.x), which stores your customized preferences and environment settings,  basically all the options and settings you see here:

Click on Tools menu on top then Options:












The file is created or saved (if it already exists) whenever you close SQL Server Management Studio (SSMS).

If you have customized your SSMS appearance and experience significantly, you should consider backing up this file. It is located at "%USERPROFILE%\AppData\Local\Microsoft\SQL Server Management Studio\%Version%\Settings\SQL Server Management Studio".

A couple of other configuration files you may want to save are Ssms.exe.config and RegSrvr*.xml.


Clearing the cache per se is simple. First, if you have SQL Server Management Studio (SSMS) running, make sure to close it. Copy the following folders somewhere safe in case you need them, then delete them entirely. Alternatively, you could also move (cut and paste) these folders to another location.

  • "%USERPROFILE%\AppData\Local\Microsoft\SQL Server Management Studio"
  • %USERPROFILE%\AppData\Roaming\Microsoft\SQL Server Management Studio"

There is a Microsoft article on clearing everything from SSMS cache:


Clear SSMS Cache Files


Then when you relaunch the SSMS after clearing the cache, you will see the following warning, which you could ignore by clicking OK if you want to start SSMS with a blank slate, or as I just did a bit ago, save the settings files somewhere before deleting the cache files, then copy NewSettings.vssettings back to its original location.










Here is a PowerShell script to backup then clear the SSMS cache:


# Current User's profile location
$ProfilePath = $env:USERPROFILE # Backup location $DateTimeSuffix = Get-Date -Format 'yyyyMMdd_HHmmsss' $BackupPath = Join-Path $ProfilePath "Documents\SQL Server Management Studio - Backup\$DateTimeSuffix" # Create the backup directory if doesn't already exists New-Item -Path $BackupPath -ItemType Directory -Force # Get the folder names where SSMS cache files are stored $SSMSFolders = Get-Item "$ProfilePath\AppData\*\Microsoft\SQL Server Management Studio" # Move the SMSS cache folders to Backup directory ForEach($SSMSFolder in $SSMSFolders) { $DestPath = Join-Path $BackupPath $SSMSFolder.Parent.FullName.Replace("$ProfilePath\AppData\", "") New-Item -Path $DestPath -ItemType Directory -Force Move-Item -Path $SSMSFolder.FullName -Destination $DestPath -Force }