When it comes to query performance, lately, Cardinality Estimates seem to have been a hot topic with each new release of SQL Server. They are making the query optimizer smarter, helping it generate more efficient query execution plans, and making queries run
The SQL Server query optimizer, being 'cost-based', fundamentally relies on cardinality estimation, using per-attribute summary statistics (histograms). This principle applies to any Relational Database Management System (RDBMS), such as Oracle, that employs a cost-based optimizer.
The following description and warning included with the SSMS 'create Extended Events' wizard should be self-explanatory regarding why cardinality estimates matter.
Occurs when an operator outputs significantly more rows than estimated by the Query Optimizer. Use this event to identify queries that may be using sub-optimal plans due to cardinality estimate inaccuracy. Using this event can have a significant performance overhead so it should only be used when troubleshooting or monitoring specific problems for brief periods of time.
- The optimizer will allocate too much memory to the query, resulting in wasted memory. If this happens frequently, it can starve the SQL Server of memory, leading to spills to disk, hard paging, and so on.
- The query will be executed with a parallel plan when a serial plan would not only have been faster but also would have avoided the possibility of the infamous CXPACKET waits.
- The optimizer is more likely to choose scanning the entire index over seeking.
How are these estimates calculated?
- The header includes, among other things, the last update date, the number of rows sampled, etc. This information can be obtained from the system DMV sys.stats and the function sys.dm_db_stats_properties.
- Index density measures the uniqueness of a column. Low density indicates higher uniqueness, and vice versa. This information can be obtained from the system function sys.dm_db_stats_histogram.
- Histogram shows the number of distinct values in a column and the distribution of the data in it. For example, consider the distribution of data for the Code column with 5 distinct values:
If there is an index (and, consequently, statistics on column values) on the Code column, assuming the stats are up-to-date and accurate, the Optimizer is (rightfully so) more likely to use a Seek against that index for the following query because it knows from the statistics that the index contains much fewer records for Code = 'E'.
SELECT * FROM Blood_Type WHERE Code = 'E';
If the filter condition instead contains Code = 'F', the Optimizer is more likely to not only ignore the index but also perform a full scan on the table/clustered index!
SELECT -- TOP 10 SCHEMA_NAME(o.schema_id) [object_owner], o.name [object_name], s.name [stats_name], (sp.rows_sampled * 100)/ sp.rows percent_sampled, sp.*, s.auto_created, s.user_created, s.has_filter, s.is_temporary, s.is_incremental FROM sys.objects o inner join sys.stats s ON s.object_id = o.object_id cross apply sys.dm_db_stats_properties(s.object_id, stats_id) sp WHERE 1=1 AND o.is_ms_shipped = 0 AND s.auto_created = 0 -- AND o.name = 'LookupCodes' order by sp.rows desc;
SELECT top 100 [schemas].name [schema_name], t.name TABLE_NAME,
i.name index_name,STATS_DATE(i.object_id, i.index_id) stats_update_date,INDEX_COL([schemas].name + '.' + t.name, i.index_id, 1) ind_column, h.* FROM sys.tables t INNER JOIN sys.schemas [schemas] ON t.schema_id = [schemas].schema_id INNER JOIN sys.indexes i ON i.object_id = t.object_id INNER JOIN sys.stats s ON i.object_id = s.object_id AND i.index_id = s.stats_id INNER JOIN sys.stats_columns sc ON s.object_id = sc.object_id AND s.stats_id = sc.stats_id AND sc.stats_column_id = 1 -- inner join sys.columns c on c.object_id = s.object_id and c.column_id = sc.column_id CROSS apply sys.dm_db_stats_histogram(i.object_id, i.index_id) h WHERE 1=1 AND t.is_ms_shipped = 0 -- AND i.is_unique = 0 -- AND i.index_id > 1 AND t.name = 'tblxxx' AND i.name = 'idx_nc_tblxxx1';
Index
Creating the extended event
To create the extended event, connect to the SQL Server instance and expand the Management--->Extended Events and right click on the Sessions to launch the New Session Wizard:
Click Next on the Introduction page, if it appears on your screen.
On the next page, enter a name for the XE session, I am calling it xe_inaccurate_cardinality_estimate.
On the next page, make sure to first click on the Channel drop down and check Debug. This is important. Otherwise the "inaccurate_cardinality_estimate" event will not get displayed.
In the Event Library search box, type "inaccurate_cardinality_estimate" and click on the event name to select it. Click on > button to add the event.
'
Click Next, which brings up the page to select storage location for the capture events.
Check box in front of "Save data to a file...", which should automatically be filled in with the name of the event "xe_inaccurate_cardinality_estimate" as the storage file name. You could enter a complete path to the event file, if not by default the event files will be stored under \MSSQL\Log within the installation folder of the instance.
Adjust the Maximum file size, rollover settings to your preferences then click Next
Click Finish to create the event.
Start the extended event
Start the newly created XE session by clicking on it then select Start Session:Viewing the event data
Right click again on the XE and select Watch Live Data:
You can let the XE run for a while or run some test queries of your own, which is what I did to trigger a test event. You can find the script towards the end of this article.
This screenshot is of an event generated from my test script:
Normally you won't be watching the live data continuously so you can also view all events by right clicking on the event file then select View Target Data:
Clean Up
Addressing Skewed Cardinality Estimates (CEs)
To rectify skewed Cardinality Estimates (CEs), it's crucial to analyze and optimize your database. In the captured event data, closely examine the queries and the tables or objects referenced within them. Ensure that these database elements have appropriate indexes and up-to-date statistics—avoiding outdated or stale statistics. If necessary, update or refresh the statistics to maintain accuracy.
- Check for Missing Indexes: Utilize well-known DMV queries available online to identify any missing indexes.
- Enable Database Settings: Enable the database settings AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS to ensure that statistics are automatically generated and kept up to date.
- Regular Index and Statistics Maintenance: Implement a regular schedule for rebuilding or refreshing your indexes and statistics. You can explore the index and statistics maintenance script by Ola Hallengren at https://ola.hallengren.com/ for guidance.
- Stay Updated: Stay current with SQL Server upgrades and patches. These updates often include enhancements to queries, the query optimizer, and how CE calculations are utilized.
TSQL Scripts
Create the extended event
If you need to create this XE more than once, either on same sql server or multiple sql servers, a script is a better way to go.
-- **** FOR TESTING ONLY IN A NON-PRODUCTION SERVER ****
-- CREATE XE if exists (SELECT * FROM sys.server_event_sessions where name = 'xe_inaccurate_cardinality_estimate') DROP EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER GO CREATE EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER ADD EVENT sqlserver.inaccurate_cardinality_estimate ADD TARGET package0.event_file(SET filename=N'xe_inaccurate_cardinality_estimate')WITH ( MAX_MEMORY=4096 KB, EVENT_RETENTION_MODE=ALLOW_MULTIPLE_EVENT_LOSS, MAX_DISPATCH_LATENCY=300 SECONDS, TRACK_CAUSALITY=OFF, STARTUP_STATE=OFF )
GO -- Start the session ALTER EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER STATE=START; -- Stop the session after the workload is executed -- ALTER EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER STATE=STOP; /*-- To drop the XE if exists (SELECT * FROM sys.server_event_sessions where name = 'xe_inaccurate_cardinality_estimate') DROP EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER */
Trigger an event
/* **** FOR TESTING ONLY IN A NON-PRODUCTION SERVER **** -- disable the AUTO_CREATE_STATISTICS USE [master] GO ALTER DATABASE [AdminDBA] SET AUTO_CREATE_STATISTICS OFF GO */ set nocount on if object_id('ce_test_tbl_people') is not null drop table ce_test_tbl_people go -- create test table create table ce_test_tbl_people ( id int identity primary key, name varchar(200) not null, zipcode int not null ); -- Insert test data go insert into ce_test_tbl_people values('ABC', 12345) GO 100 insert into ce_test_tbl_people values('ABC', 45678) GO 200 insert into ce_test_tbl_people values('ABC', 93213) GO 300 insert into ce_test_tbl_people values('ABC', 74616) GO 400 insert into ce_test_tbl_people values('ABC', 48287) GO 5000 insert into ce_test_tbl_people values('ABC', 48287) GO 5000 insert into ce_test_tbl_people values('ABC', 48287) GO 5000 insert into ce_test_tbl_people values('ABC', 27646) GO 5 /* RUN A QUERY TO TRIGGER INACCURATE CARDINALITY ESTIATES EVENT MAKE SURE THE XE session for xe_inaccurate_cardinality_estimate is started */ SELECT * FROM ce_test_tbl_people WHERE ZIPCODE = 48287;
View the event data
You can write a query to read events from the event files, which is my preferred method. Here is one such query:
-- READ THE XE FILEs ;with cte as ( SELECT top 100 @@SERVERNAME [SQL Server], [file_name], cast(event_data as xml) event_data FROM sys.fn_xe_file_target_read_file('xe_inaccurate_cardinality_estimate*.xel', null, null, null) ) select [SQL Server] ,dn.v.value('.', 'varchar(100)') [Database] ,ar.v.value('.', 'int') [Actual Rows] ,er.v.value('.', 'int') [Estimated Rows] ,st.v.value('.', 'varchar(8000)') [SQL] ,[file_name] [Event File] ,cte.event_data [Event Data] ,dn.v.value('../@timestamp', 'varchar(100)') [TimeStamp] ,CONCAT('0x', ph.v.value('.', 'varchar(200)')) [Plan Handle] from cte CROSS APPLY event_data.nodes('//event/data[@name = "actual_rows"]') as ar(v) CROSS APPLY event_data.nodes('//event/data[@name = "estimated_rows"]') as er(v) CROSS APPLY event_data.nodes('//event/action[@name = "database_name"]') as dn(v) CROSS APPLY event_data.nodes('//event/action[@name = "plan_handle"]') as ph(v) CROSS APPLY event_data.nodes('//event/action[@name = "sql_text"]') as st(v) -- where ar.v.value('.', 'int') > 1000 ; -- select * from sys.dm_exec_cached_plans where plan_handle = 0x303630303031303032393830376432613630353362656434386430313030
Stop and delete the extended event
-- DROP THE XE xe_inaccurate_cardinality_estimate IF exists (SELECT * FROM sys.server_event_sessions where name = 'xe_inaccurate_cardinality_estimate') BEGIN IF exists(select * from sys.dm_xe_sessions where name = 'xe_inaccurate_cardinality_estimate') ALTER EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER STATE=STOP; DROP EVENT SESSION [xe_inaccurate_cardinality_estimate] ON SERVER END
Delete the event files
If you leave the event session running, over time the number of event files will accumulate on the sql server so whether to reclaim the disk space or you have completed your analysis and simply no longer need event files, you can utilize this PowerShell script to remotely delete the event files on the SQL Server. Make sure to replace the value for $sql_instnace_name variable before trying the script:
$sql_instance_name = "TestSQLServer\TestInstance" $sql_query = "SELECT distinct @@SERVERNAME [SQL_Instance], SERVERPROPERTY('MachineName') [Computer_Name], [file_name] FROM sys.fn_xe_file_target_read_file('xe_inaccurate_cardinality_estimate*.xel', null, null, null); " #If you are using a SQL login with password, at the end of the Invoke-Sqlcmd add: -Username '<SQL Login>' -Password '<password>' $sql_xe_files = Invoke-Sqlcmd -ServerInstance $sql_instance_name -Query $sql_query $sql_xe_file_names = $sql_xe_files.file_name $sql_host_name = $sql_xe_files.Computer_Name[1] $sql_host_fqdn = [System.Net.Dns]::GetHostEntry($sql_host_name).HostName $file_object = Invoke-Command -ArgumentList $sql_xe_file_name -computername $sql_host_fqdn -scriptBlock {get-item $Using:sql_xe_file_names} #Uncomment the next line to actually remove the event files on the server #$file_object = Invoke-Command -ArgumentList $sql_xe_file_name -computername $sql_host_fqdn -scriptBlock {remove-item $Using:sql_xe_file_names}