Search This Blog

Monday, September 22, 2025

Set Theory vs. Batch Mode in SQL Server

Set Theory vs. Batch Mode in SQL Server

Not long ago, a colleague of mine was completely shocked when he first heard about SQL Server’s Batch Mode feature. His immediate reaction was: “Wait, but SQL is set-based. Doesn’t that mean it already processes everything as a set instead of row by row?”

That’s actually a really common assumption, and it makes sense at first glance. After all, people often describe relational databases as powerful because they “process sets of data at once.” But the reality is a little more nuanced. Let’s clear this up.

Set-Based SQL Is About How You Express Queries

When we say SQL is a set-based language, we’re talking about how you, the database professional, express what you want. SQL is declarative. You write what result set you need, and you leave it up to SQL Server to figure out how to produce it.

For instance:

  • “Give me the top-selling products this season.”

  • “Show me all customers in the West region.”

You usually don't specify whether SQL Server should use a particular index, scan all rows, or parallelize execution. That’s the optimizer’s job.

But, and here’s where the misconception comes in, asking for a set result doesn’t mean SQL Server executes it in one giant, magical sweep. Under the covers, the engine usually streams rows through operators in a plan, often one at a time, kind of like an assembly line.

Enter Batch Mode

That’s where Batch Mode changes the game. Instead of passing rows through operators one at a time, Batch Mode allows the engine to group rows together, typically around 900 at a time, and process them as a single unit. Think of it as SQL Server “vectorizing” execution.

Here’s a mental image:

  • Row Mode: You buy 10 apples at the grocery store, and the cashier scans each apple one by one.

  • Batch Mode: The 10 apples come in a bag, and the cashier just scans the bag once.

The result? Less overhead and faster performance, especially for analytic and large-scale queries.

Set Theory vs. Batch Mode

This is why it’s important not to conflate relational set theory with Batch Mode execution. They live in completely different layers:


Batch Mode

Set Theory / Relational Model

What is it?

Query execution engine optimization

Query Writing or formulation approach

How it works?

Rows, handled in chunks (~900) internally

Sets/relations (no row-by-row logic)

Why use it?

Improve efficiency of analytic queries

Promote declarative, efficient operations

Where it operates?

Physical (implementation/execution)

Logical (query semantics/design)

Example

using a vectorized batch operator

SELECT SUM(column) FROM table;


So, SQL being set-based tells us how we describe queries. Batch Mode tells us how the engine might execute them internally. Two different things, both powerful—but they shouldn’t be confused.

Batch Mode Through the Versions

Batch Mode wasn’t always around. It showed up with SQL Server 2012 alongside Columnstore indexes, which were built for data warehousing workloads. For years, Batch Mode was tied entirely to columnstore. Then SQL Server 2019 broadened access by making Batch Mode "available" on regular rowstore tables as well (as long as the database compatibility level is set to 150 or higher).

Here’s a quick version history:

SQL Server Version

Batch Mode on Columnstore

Batch Mode on Rowstore

Pre 2012

No

No

2012

Yes

No

2014

Yes

No

2016

Yes

No

2017

Yes

No

2019 and later

Yes

Yes


Of course, even when Batch Mode is available, it doesn’t mean every query will use it. The optimizer decides case by case which execution path is best. Sometimes, Row Mode is still the winner.

Wrapping Up

SQL’s set-based nature is about your interface with the language. Batch Mode, on the other hand, is an internal engine optimization that decides how the work actually gets done.



Demo


/* ** Demo: Batch Mode Vs Row Mode **

Let’s walk through a quick demo to see Batch Mode in practice. 
This works in SQL Server 2019 or later, since Batch Mode is available for 
rowstore tables (provided the database compatibility level is 150 or higher).

The script performs the following actions:

1. Creates a BatchModeDemoDB database 
2. Creates table BatchModeDemoTbl
3. Loads data into BatchModeDemoTbl table
4. Runs a query that is likely to use the Batch mode
5. Runs a second query against a smaller number of records to demo the row mode
6. Cleans up the BatchModeDemoDB database

*/


USE master;
GO
IF EXISTS(SELECT * FROM SYS.DATABASES WHERE NAME = 'master' AND compatibility_level < 150 )
BEGIN
	RAISERROR('Batch Mode feature is not available on your current version of SQL Server.', 16, 1)
END;
ELSE
BEGIN
	IF DB_ID('BatchModeDemoDB') IS NOT NULL DROP DATABASE BatchModeDemoDB;
	CREATE DATABASE BatchModeDemoDB;
	ALTER DATABASE [BatchModeDemoDB] SET RECOVERY SIMPLE WITH NO_WAIT;
	ALTER DATABASE [BatchModeDemoDB] MODIFY FILE ( NAME = N'BatchModeDemoDB', SIZE = 100MB)
	ALTER DATABASE [BatchModeDemoDB] MODIFY FILE ( NAME = N'BatchModeDemoDB_log', SIZE = 300MB)
END;
GO

USE BatchModeDemoDB;
ALTER AUTHORIZATION ON DATABASE::[BatchModeDemoDB] TO [sa];

-- Create a test table
DROP TABLE IF EXISTS dbo.BatchModeDemoTbl;
CREATE TABLE dbo.BatchModeDemoTbl
(
    Id       INT IDENTITY(1,1) PRIMARY KEY,
    Category INT NOT NULL,
    Amount   DECIMAL(10,2) NOT NULL
);

-- Insert a large amount of sample data, e.g.  1 million rows
INSERT INTO dbo.BatchModeDemoTbl (Category, Amount)
SELECT TOP (1000000)
    (ABS(CHECKSUM(NEWID())) % 10) AS Category,
    CAST(ABS(CHECKSUM(NEWID())) % 1000 AS DECIMAL(10,2))
FROM sys.objects a
CROSS JOIN sys.objects b
CROSS JOIN sys.objects c;

-- Lets run two sample queries
SET STATISTICS XML ON;

-- ** This query should use the Batch mode **
SELECT Category, SUM(Amount) AS TotalAmount
FROM dbo.BatchModeDemoTbl
GROUP BY Category;


/* ** This next query should use the row mode **
SQL Server is more likely to choose Row Mode 
for small result sets, since the overhead of 
batching isn’t worth it.
*/

SELECT Category, SUM(Amount) AS TotalAmount
FROM dbo.BatchModeDemoTbl
WHERE Id < 1000
GROUP BY Category;


SET STATISTICS XML OFF;
USE master;

-- Clean up
IF DB_ID('BatchModeDemoDB') IS NOT NULL DROP DATABASE BatchModeDemoDB;


After you run the above script, the results should look like this:

















  • In the execution plan (Ctrl+M in SSMS or via the XML output), look for operators like Hash Match, Clustered Index Scan or Aggregate.

  • Check the Actual Execution Mode property. If Batch Mode was used, you’ll see:

  • Actual Execution Mode = Batch
    Estimated Execution Mode = Batch










If the optimizer decided Row Mode was more efficient, you’ll see “Row.”

Next, try some demo queries on your own, compare Row vs. Batch, and look for execution plan changes.

Tip: For wider queries (more columns, larger aggregates), SQL Server is more likely to choose Batch Mode, since batching pays off with vectorized operators.





Tuesday, June 24, 2025

Measuring and Improving SQL Server Query Plan Cache Efficiency

Measuring and Improving SQL Server Query Plan Cache Efficiency

The query plan cache hit ratio in SQL Server indicates the percentage of queries that are executed using an already generated and available execution plan stored in the SQL Server memory cache. If no suitable plan exists for a query submitted to SQL Server, the query optimizer must create a new plan, which consumes valuable resources and time—a situation known as a "cache miss".  This article though does not cover how SQL Server decides whether a suitable matching plan already exists for reuse. 

Some level of cache misses are expected, especially right after the SQL Server instance starts. However, as the system continues running and the cache fills, cache misses should become rare. A high plan cache hit ratio, typically 90% or above, is desirable and indicates efficient plan reuse. On a highly OLTP system I personally prefer to see that number even higher like 98% or even close to 100%, as a higher ratio means better performance due to reduced CPU and memory usage from plan recompilation.


Conversely, a low hit ratio, such as 75% or worse, may indicate that queries are not being optimized for reuse or that the plan cache size or overall system memory is not enough. You may need to review query patterns or consider increasing the plan cache size by adding more memory to the server (because there is no direct configuration option in SQL Server to explicitly set or increase the plan cache size).


The first step in tackling plan cache efficiency is to measure the plan cache hit ratio:

/*
This query calculates the Plan Cache Hit Ratio as a percentage 
in SQL Server. The Plan Cache Hit Ratio measures how often SQL 
Server is able to reuse execution  plans from the plan cache, 
rather than having to compile new ones. 

A higher ratio indicates better performance, as reusing plans saves 
CPU and memory resources.

How: 
1. The query pulls two values from the dynamic management 
   view sys.dm_os_performance_counters:
- Cache hit ratio: The number of times a cached plan was found and reused.
- Cache hit ratio base: The total number of cache lookups

2. It divides the hit count by the total lookups and multiplies by 
   100 to get a percentage.

3. The RTRIM(object_name) is used to ensure trailing spaces 
   in the object_name column do not affect the filter.
 Output:
    - Returns a single value: the plan cache hit ratio as a percentage.
    - A high value (close to 100%) means most query plans are being reused.
    - A low value may indicate frequent query recompilations, 
      which can impact performance.
*/

SELECT
    100 * 
    (
        -- Numerator: Get the current value of the 'Cache hit ratio' counter
        SELECT cntr_value
        FROM sys.dm_os_performance_counters
        WHERE 
            RTRIM(object_name) LIKE '%:Plan Cache'  -- Filter for the Plan Cache object
            AND counter_name = 'Cache hit ratio'    -- Select the 'Cache hit ratio' counter
            AND instance_name = '_Total'            
    ) 
    /
    (
        -- Denominator: Get the current value of the 'Cache hit ratio base' counter
        SELECT cntr_value
        FROM sys.dm_os_performance_counters
        WHERE 
            RTRIM(object_name) LIKE '%:Plan Cache'      
            AND counter_name = 'Cache hit ratio base'   -- Select the base counter for the ratio
            AND instance_name = '_Total'                
    ) AS [Plan cache hit ratio %];











Tips For Improving Plan Cache Hit Ratio


While the following is by no means a complete or exhaustive list, here are some steps you can take to improve plan reusability and, in turn, increase the plan hit ratio.

When Writing Queries:

  • Use Stored Procedures as much as possible.  Their precompiled nature promotes plan reuse and reduces compilation overhead
  • Avoid writing Ad-Hoc queries and minimize dynamic SQL. If unavoidable, use sp_executesql with parameters instead of EXEC to enable parameterization. Require or at least suggest application developers to utilize query parameterization feature in application code
  • Standardize query formatting for consistent casing, spacing, and punctuation. Even trivial variations (e.g., extra spaces) generate new plan hashes.  SQL Server performs a binary comparison of the incoming query's text with cached queries. Any difference in whitespace, casing, comments, or literals will result in a different plan hash and prevent reuse
  • Consider using a plan guide or a query hint to force parameterization for a specific ad hoc query that generates too many cached execution plans due to variations in the literal values each time it is executed.


Minimize Plan Recompilations

Yes, plan recompilations do factor into cached plan ratio calculations and will lower the overall plan cache hit ratio.

Here are the primary triggers or reasons for plan recompilations:

Trigger Type

Example/Description

Schema change

ALTER TABLE, DROP INDEX, etc.

Statistics update

UPDATE STATISTICS, auto stats update

SET options change

Changing ANSI_NULLS, etc.

sp_recompile

Manual plan removal for object

Query hint

OPTION (RECOMPILE), WITH RECOMPILE

Cache clearing

DBCC FREEPROCCACHE

Query text change

Any modification to the SQL text

Dependent object change

Altering tables/views referenced by procedures   


Apart from those, when SQL Server experiences memory pressure—either because the plan cache itself grows too large (local memory pressure) or because the overall system is low on memory (global memory pressure)—it will proactively remove execution plans from the plan cache to free up memory.

For most systems, the most common reasons involve frequent (and justified) index rebuilds or reorganizations, as well as statistics updates (manual or automatic). This is because it is considered a performance best practice to regularly rebuild indexes and update statistics to keep them compact and current. To that end, we typically schedule a daily/nightly job for index and statistics maintenance.

When you modify, drop, or rebuild an index on a table, while SQL Server does not immediately remove related execution plans from the cache, it does mark them as invalid. The next time a query that references the changed table is executed, SQL Server detects the schema change, recompiles the query, and generates a new execution plan, which is then stored in the cache. So unless you have to, scheduled your index maintenance jobs during off peak hours.

In fact, Creating or dropping indexes can indirectly affect plan cache efficiency. If queries are frequently recompiled due to index changes or if many single-use plans are generated, this can lead to plan cache bloat, reducing cache efficiency and potentially degrading performance.

Same goes for the index statistics. SQL Server query optimizer, being 'cost-based', fundamentally relies on cardinality estimation, using per-attribute summary statistics (histograms) to generate efficient execution plans. When statistics are updated—especially if AUTO_UPDATE_STATISTICS is enabled—SQL Server typically invalidates existing cached plans that depend on those statistics. This triggers recompilation the next time the query runs, ensuring the new plan reflects the updated data distribution. If statistics are stale or missing, SQL Server may generate suboptimal plans, and these can remain in cache until statistics are refreshed.


Query Optimization

If you have a low plan cache hit ratio then you may also have poorly performing queries. There maybe things you can do to identify and improve those queries. 

You should also review the overall system configuration - including the database, instance, and server - for optimization. While application design typically falls outside the DBA's direct control (even though many DBAs have development experience), if you have thoroughly optimized everything within your scope and documented your efforts, it should not be difficult to make a strong case for an application redesign.


Resources:


sys.dm_exec_cached_plans

SQL Server Plan Cache Limits

Forced Parameterization

Can Forced Parameterization Go Wrong


Additional Queries:


-- Get top 10 queries by average elapsed time (in milliseconds)
-- Only include queries that have been executed at least 5 times
SELECT TOP 10
    SUBSTRING(qt.TEXT, (qs.statement_start_offset/2)+1,
        ((CASE qs.statement_end_offset
              WHEN -1 THEN DATALENGTH(qt.TEXT)
              ELSE qs.statement_end_offset
          END - qs.statement_start_offset)/2)+1) as query_text,
    qs.execution_count,
    qs.total_logical_reads,
    qs.total_logical_writes,
    qs.total_worker_time/1000 AS total_worker_time_ms,
    qs.total_elapsed_time/1000 AS total_elapsed_time_ms,
    (qs.total_elapsed_time / qs.execution_count) / 1000 avg_elapsed_time_ms,
    qs.last_execution_time ,
    qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
WHERE qs.execution_count > 4 
ORDER BY avg_elapsed_time_ms DESC;


-- Get size of the plan cache by plan type
;WITH cte_plan_cache_usage_by_obj_type AS 
(
	SELECT COALESCE(objtype, '-- ALL Plans') objtype,
           Sum(Cast(size_in_bytes AS BIGINT)) / 
		              1024 / 1024 size_mb
    FROM   sys.dm_exec_cached_plans
    GROUP  BY rollup ( objtype )
)
SELECT objtype,
       size_mb,
	   'percent' = 
		   ( size_mb * 100 ) / 
                       (SELECT size_mb
			FROM   cte_plan_cache_usage_by_obj_type
			WHERE  objtype = '-- ALL Plans'
		   ) 
FROM   cte_plan_cache_usage_by_obj_type
ORDER  BY size_mb DESC



Monday, January 13, 2025

15 Characters or Less: The Challenge of SQL Server Listeners Name

15 Characters or Less: The Challenge of SQL Server Listeners Name

There is a hard limit on how long your AlwaysOn listener name can be. That is expected; what may not be expected is that the limit is a mere 15 characters or less.

Yes, even though the maximum length allowed for an AlwaysOn listener DNS name is 63 characters, in practice you can only specify up to 15 characters. This limit comes from the NetBIOS computer name restriction, which is 15 characters, even in the latest versions of Windows 2025. Technically, the maximum length for a NetBIOS name is 16 characters, but the 16th character (NetBIOS Suffix) is reserved for identifying the network device functionality, effectively limiting it to 15 characters.

The limit is for the name part, not the entire Fully Qualified Domain Name (FQDN), which can be up to 255 characters long.

So if you are using SSMS to create an AlwaysOn Listener, it won't even let you type in more than 15 characters:






When I first encountered this issue a few years ago, that should have been my first clue. Did I pay attention and think for a second? Of course not! I spent close to two hours before zeroing in on what the main issue was.

If you bypass the GUI as I did back then and resort to TSQL (or PowerShell), you will receive error similar to this:

Msg 41066, Level 16, State 0, Line 3

Cannot bring the Windows Server Failover Clustering (WSFC) resource (ID '70ce3804-e3ed-4db6-ad42-6936daeeb5aa') online (Error code 5942). 
The WSFC service may not be running or may not be accessible in its current state, or the WSFC resource may not be in a state that could
accept the request.  For information about this error code, see "System Error Codes" in the Windows Development documentation.

Msg 19476, Level 16, State 4, Line 3

The attempt to create the network name and IP address for the listener failed.

The WSFC service may not be running or may be inaccessible in its current state, or
the values provided for the network name and IP address may be incorrect. Check the state of the WSFC cluster and
validate the network name and IP address with the network administrator.














The error is ugly and not intuitive! SQL Server and FCS logs are not very helpful either. 

As a last resort, you could bypass the SQL Server tools completely and create the listener through the Failover Cluster Manager tool, by adding a new resource to your Availability Group of type 'Client Access Point':








On the surface, this will work and complete successfully:









In reality, the newly added Client Access Point will not come online, and your AlwaysOn resource will show as partially failed











Is there a solution or workaround that works?

Sadly, no. Though justifiably annoying, in the Microsoft Windows world, there is no way to get around this limit. The catch-all reason given is of course - "For backward compatibility with legacy Windows Computers and Applications that may still need compatibility with  NetBIOS". Even if you do not have any legacy systems, this requirement is still enforced. The Windows Failover Clustering still requires that the cluster name (the virtual computer name that you can connect to) must strictly adhere to NetBIOS naming convention. And this requirement extends to related services, such as SQL listener names for Always On Availability Groups. Bottom line: If the Windows Failover Cluster (WFC) service cannot successfully register your chosen name in the network (per NetBIOS standards), the resource will be unable to come online, even if the initial resource creation appears successful.

A potential alternative could be to install your SQL Servers in the Linux world without Windows Failover Clustering and Active Directory. I am told you could have a longer name for the AlwaysOn Listener in this environment. However, I have not tested this myself and therefore would not count it as a definitive solution. Even if it works, isn't that too much effort just to name your listener with more than 15 characters?

I ended up making a slight change in my naming convention so that the listener name would not exceed 15 characters.


Resources:


Configure a listener for an Always On availability group