Search This Blog

Wednesday, July 19, 2023

Tracking the Inaccurate Cardinality Estimates

Tracking the Inaccurate Cardinality Estimates

When it comes to query performance, lately, Cardinality Estimates seem to have been a hot topic with each new release of SQL Server. They are making the query optimizer smarter, helping it generate more efficient query execution plans, and making queries run 

In fact, one might argue that the SQL Server query optimizer, being 'cost-based,' is fundamentally reliant on cardinality estimation, using per-attribute summary statistics (histograms). I might venture to say that in the real world, cardinality inaccuracies are relatively frequent occurrences that lead to slow queries and unpredictable performance. The following description and warning included with the SSMS 'create Extended Events' wizard should be self-explanatory regarding why cardinality estimates matter.

Inaccurate Cardinality Estimates

Occurs when an operator outputs significantly more rows than estimated by the Query Optimizer. Use this event to identify queries that may be using sub-optimal plans due to cardinality estimate inaccuracy. Using this event can have a significant performance overhead so it should only be used when troubleshooting or monitoring specific problems for brief periods of time.


Wait a minute, according to that description, it's bad for performance when the optimizer underestimates the row count. Does that mean I don't need to worry about it if it overestimates the row count? Actually, overestimation is also bad for performance, perhaps not as severely, but it's still problematic enough that you should not ignore it. When the optimizer overestimates the row count:

  • The optimizer will allocate too much memory to the query, resulting in wasted memory. If this happens frequently, it can starve the SQL Server of memory, leading to spills to disk, hard paging, and so on.
  • The query will be executed with a parallel plan when a serial plan would not only have been faster but also would have avoided the possibility of the infamous CXPACKET waits.
  • The optimizer is more likely to choose scanning the entire index over seeking.


At a very basic level, the Cardinality Estimator is an estimate of the row count for each operation in the query, particularly for columns used in filter conditions (also known as the WHERE CLAUSE) and columns used in JOIN PREDICATES.

How are these estimates calculated?


While the internal algorithms evolve and keep changing, in general, the Optimizer uses the statistics created on the underlying objects/tables. Statistics are implicitly or automatically created when any index is created, and you can also create them separately, independent of any index. Additionally, SQL Server will automatically create statistics it needs if the AUTO_CREATE_STATISTICS setting is ON for the database. There are three key pieces of information in statistics:

  • The header includes, among other things, the last update date, the number of rows sampled, etc. This information can be obtained from the system DMV sys.stats and the function sys.dm_db_stats_properties.
  • Index density measures the uniqueness of a column. Low density indicates higher uniqueness, and vice versa. This information can be obtained from the system function sys.dm_db_stats_histogram.
  • Histogram shows the number of distinct values in a column and the distribution of the data in it. For example, consider the distribution of data for the Code column with 5 distinct values:



If there is an index (and, consequently, statistics on column values) on the Code column, assuming the stats are up-to-date and accurate, the Optimizer is (rightfully so) more likely to use a Seek against that index for the following query because it knows from the statistics that the index contains much fewer records for Code = 'E'.

SELECT * FROM Blood_Type WHERE Code = 'E';


If the filter condition instead contains Code = 'F', the Optimizer is more likely to not only ignore the index but also perform a full scan on the table/clustered index!

While there have been significant improvements in how the CE's are calculated, there will still be instances where those estimates will differ, sometimes significantly, from the reality, i.e., the actual row counts.

Before I get to the Extended Events, let me share couple queries that I use to look at the meta data for the statistics and their histograms.

SELECT -- TOP 10
	   SCHEMA_NAME(o.schema_id) [object_owner],
	   o.name [object_name],
	   s.name [stats_name],
	   (sp.rows_sampled * 100)/ sp.rows percent_sampled,
       sp.*, 
       s.auto_created,
	   s.user_created,
	   s.has_filter,
	   s.is_temporary,
	   s.is_incremental
FROM sys.objects o
inner join sys.stats s ON s.object_id = o.object_id
cross apply sys.dm_db_stats_properties(s.object_id, stats_id) sp
WHERE	1=1
	AND o.is_ms_shipped = 0
	AND s.auto_created = 0
--	AND o.name = 'LookupCodes'
order by sp.rows desc;  

The query retrieves information about statistics for user-created objects, excluding system objects, and sorts the result by the number of rows in the statistics in descending order. It also calculates the percentage of rows sampled for each statistic.












Get index/statistics histograms:

SELECT top 100 
		[schemas].name [schema_name],
		t.name TABLE_NAME,
		i.name index_name,
STATS_DATE(i.object_id, i.index_id) stats_update_date,
INDEX_COL([schemas].name + '.' + t.name, i.index_id, 1) ind_column, h.* FROM sys.tables t INNER JOIN sys.schemas [schemas] ON t.schema_id = [schemas].schema_id INNER JOIN sys.indexes i ON i.object_id = t.object_id INNER JOIN sys.stats s ON i.object_id = s.object_id AND i.index_id = s.stats_id INNER JOIN sys.stats_columns sc ON s.object_id = sc.object_id AND s.stats_id = sc.stats_id AND sc.stats_column_id = 1 -- inner join sys.columns c on c.object_id = s.object_id and c.column_id = sc.column_id CROSS apply sys.dm_db_stats_histogram(i.object_id, i.index_id) h WHERE 1=1 AND t.is_ms_shipped = 0 -- AND i.is_unique = 0 -- AND i.index_id > 1 AND t.name = 'tblxxx' AND i.name = 'idx_nc_tblxxx1';



The query retrieves detailed information about the specified non-clustered index, including its name, schema, last update date for statistics, the name of the first column in the index, and histogram data.











Now, the remainder of the article will focus on leveraging the Extended Events feature to detect these irregularities, often referred to as 'skewed cardinality estimates' in SQL Server documentation. 

And I would like to also emphasize the importance of selective data capture to avoid excessive information gathering.





Creating the extended event

(If you prefer TSQL scripts instead of a GUI Wizard, you find them towards the end of this article.)

To create the extended event, connect to the SQL Server instance and expand the Management--->Extended Events and right click on the Sessions to launch the New Session Wizard:















Click Next on the Introduction page, if it appears on your screen.

On the next page, enter a name for the XE session, I am calling it xe_inaccurate_cardinality_estimate.


Click Next and on the next page, select Do Not Use a Template.


On the next page, make sure to first click on the Channel drop down and check Debug. This is important. Otherwise the "inaccurate_cardinality_estimate" event will not get displayed.












In the Event Library search box, type "inaccurate_cardinality_estimate" and click on the event name to select it. Click on > button to add the event.











Click Next that will bring up option to capture Global Fields.  Here, I will check boxes next to database_id, database_name, sql_text and plan_handle. Click Next.



In the following screen, you can apply filters to refine the results as needed, such as by specifying a filter based on the database_id. I'll be adding a filter that selects events where the actual row count exceeds 1000. Without such a filter, SQL Server might capture events even when the estimated row count is 1 and the actual count is 2. Would you consider this a significant deviation from the estimate? It certainly is from the perspective of Extended Events (XE).

To enhance the effectiveness and practicality of this approach, you may also consider adding an additional filter to capture events where the estimated row count significantly differs from the actual row count. However, for now, I'll stick with filtering based on actual row counts only.

 







'




Click Next, which brings up the page to select storage location for the capture events.










Check box in front of "Save data to a file...", which should automatically be filled in with the name of the event "xe_inaccurate_cardinality_estimate" as the storage file name. You could enter a complete path to the event file, if not by default the event files will be stored under \MSSQL\Log within the installation folder of the instance.


Adjust the Maximum file size, rollover settings to your preferences then click Next

Click Finish to create the event.


Start the extended event

Start the newly created XE session by clicking on it then select Start Session:














Viewing the event data

Right click again on the XE and select Watch Live Data:













You can let the XE run for a while or run some test queries of your own, which is what I did to trigger a test event. You can find the script towards the end of this article.

This screenshot is of an event generated from my test script:









Normally you won't be watching the live data continuously so you can also view all events by right clicking on the event file then select View Target Data:






You can also use a query to read the events from event file.  You can find the script I use towards the end of this article.



Clean Up


You can right click on the extended event then select Stop Session.
Right click on the event again and select Delete to remove it.

Note: Deleting the extended event won't delete the event files on the SQL Server.



Addressing Skewed Cardinality Estimates (CEs)

To rectify skewed Cardinality Estimates (CEs), it's crucial to analyze and optimize your database. In the captured event data, closely examine the queries and the tables or objects referenced within them. Ensure that these database elements have appropriate indexes and up-to-date statistics—avoiding outdated or stale statistics. If necessary, update or refresh the statistics to maintain accuracy.


The availability of current and accurate statistics, whether they're implicitly created with an index, manually generated by a Database Administrator (DBA), or automatically created through AUTO_CREATE_STATISTICS, is fundamental to enhancing CE accuracy.

Here are some general recommendations:


  • Check for Missing Indexes: Utilize well-known DMV queries available online to identify any missing indexes.
  • Enable Database Settings: Enable the database settings AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS to ensure that statistics are automatically generated and kept up to date.
  • Regular Index and Statistics Maintenance: Implement a regular schedule for rebuilding or refreshing your indexes and statistics. You can explore the index and statistics maintenance script by Ola Hallengren at https://ola.hallengren.com/ for guidance.
  • Stay Updated: Stay current with SQL Server upgrades and patches. These updates often include enhancements to queries, the query optimizer, and how CE calculations are utilized.
By following these best practices, you can effectively address skewed Cardinality Estimates and improve the overall performance and accuracy of your SQL Server queries.



TSQL Scripts


Create the extended event

If you need to create this XE more than once, either on same sql server or multiple sql servers, a script is a better way to go.

-- **** FOR TESTING ONLY IN A NON-PRODUCTION SERVER ****

-- CREATE XE
if exists (SELECT * FROM sys.server_event_sessions where name = 'xe_inaccurate_cardinality_estimate')
		DROP EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER 
GO
CREATE EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER 
ADD EVENT sqlserver.inaccurate_cardinality_estimate
ADD TARGET package0.event_file(SET filename=N'xe_inaccurate_cardinality_estimate')
WITH 
(
		MAX_MEMORY=4096 KB,
		EVENT_RETENTION_MODE=ALLOW_MULTIPLE_EVENT_LOSS,
		MAX_DISPATCH_LATENCY=300 SECONDS,
		TRACK_CAUSALITY=OFF,
		STARTUP_STATE=OFF
)
GO

-- Start the session
ALTER EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER STATE=START;


-- Stop the session after the workload is executed
-- ALTER EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER STATE=STOP;

/*-- To drop the XE
if exists (SELECT * FROM sys.server_event_sessions where name = 'xe_inaccurate_cardinality_estimate')
		DROP EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER 
*/


Trigger an event


Generate sample data/query to trigger the inaccurate cardinality estimates event:
   
/* **** FOR TESTING ONLY IN A NON-PRODUCTION SERVER ****
-- disable the AUTO_CREATE_STATISTICS
USE [master]
GO
ALTER DATABASE [AdminDBA] SET AUTO_CREATE_STATISTICS OFF
GO
*/

set nocount on
if object_id('ce_test_tbl_people') is not null drop table ce_test_tbl_people
go
-- create test table
create table ce_test_tbl_people
(
		id int identity primary key,
		name varchar(200) not null,
		zipcode int not null
);
-- Insert test data
go
insert into ce_test_tbl_people values('ABC', 12345)
GO 100
insert into ce_test_tbl_people values('ABC', 45678)
GO 200
insert into ce_test_tbl_people values('ABC', 93213)
GO 300
insert into ce_test_tbl_people values('ABC', 74616)
GO 400
insert into ce_test_tbl_people values('ABC', 48287)
GO 5000
insert into ce_test_tbl_people values('ABC', 48287)
GO 5000
insert into ce_test_tbl_people values('ABC', 48287)
GO 5000
insert into ce_test_tbl_people values('ABC', 27646)
GO 5


/* 
RUN A QUERY TO TRIGGER INACCURATE CARDINALITY ESTIATES EVENT
MAKE SURE THE XE session for xe_inaccurate_cardinality_estimate is started
*/
SELECT * FROM ce_test_tbl_people WHERE ZIPCODE = 48287;

View the event data


You can write a query to read events from the event files, which is my preferred method. Here is one such query:

--  READ THE XE FILEs
;with cte as
(
		SELECT top 100 
				@@SERVERNAME [SQL Server],
				[file_name],
                cast(event_data as xml) event_data
		FROM sys.fn_xe_file_target_read_file('xe_inaccurate_cardinality_estimate*.xel', null, null, null)

)
select 
		 [SQL Server] 
		,dn.v.value('.', 'varchar(100)') [Database]
		,ar.v.value('.', 'int') [Actual Rows]
		,er.v.value('.', 'int') [Estimated Rows]
		,st.v.value('.', 'varchar(8000)') [SQL]
		,[file_name] [Event File]
		,cte.event_data [Event Data]
		,dn.v.value('../@timestamp', 'varchar(100)') [TimeStamp]
		,CONCAT('0x', ph.v.value('.', 'varchar(200)')) [Plan Handle]

from cte
CROSS APPLY event_data.nodes('//event/data[@name = "actual_rows"]')     as ar(v)
CROSS APPLY event_data.nodes('//event/data[@name = "estimated_rows"]')  as er(v)
CROSS APPLY event_data.nodes('//event/action[@name = "database_name"]') as dn(v)
CROSS APPLY event_data.nodes('//event/action[@name = "plan_handle"]')   as ph(v)
CROSS APPLY event_data.nodes('//event/action[@name = "sql_text"]')      as st(v)
-- where ar.v.value('.', 'int') > 1000
;

-- select * from sys.dm_exec_cached_plans where plan_handle = 0x303630303031303032393830376432613630353362656434386430313030


Stop and delete the extended event

-- DROP THE XE xe_inaccurate_cardinality_estimate
IF exists (SELECT * FROM sys.server_event_sessions where name = 'xe_inaccurate_cardinality_estimate')
BEGIN
		IF exists(select * from sys.dm_xe_sessions where name = 'xe_inaccurate_cardinality_estimate')
				ALTER EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER STATE=STOP;

		DROP EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER 
END


Delete the event files

If you leave the event session running,  over time the number of event files will accumulate on the sql server so whether to reclaim the disk space or you have completed your analysis and simply no longer need event files,  you can utilize this PowerShell script to remotely delete the event files on the SQL Server. Make sure to replace the value for $sql_instnace_name variable before trying the script:

$sql_instance_name = "TestSQLServer\TestInstance"

$sql_query = "SELECT distinct @@SERVERNAME [SQL_Instance],
                SERVERPROPERTY('MachineName') [Computer_Name],
                [file_name]
FROM sys.fn_xe_file_target_read_file('xe_inaccurate_cardinality_estimate*.xel', null, null, null);
"

#If you are using a SQL login with password, at the end of the Invoke-Sqlcmd add: -Username '<SQL Login>' -Password '<password>'
$sql_xe_files      = Invoke-Sqlcmd -ServerInstance $sql_instance_name -Query $sql_query
$sql_xe_file_names = $sql_xe_files.file_name
$sql_host_name     = $sql_xe_files.Computer_Name[1]
$sql_host_fqdn     = [System.Net.Dns]::GetHostEntry($sql_host_name).HostName


$file_object = Invoke-Command -ArgumentList $sql_xe_file_name -computername $sql_host_fqdn -scriptBlock {get-item $Using:sql_xe_file_names}

#Uncomment the next line to actually remove the event files on the server
#$file_object = Invoke-Command -ArgumentList $sql_xe_file_name -computername $sql_host_fqdn -scriptBlock {remove-item $Using:sql_xe_file_names}